Log in Help
Homereleasesgate-5.0-build3244-ALLdoctao 〉 splitch10.html

Chapter 10
Working with Ontologies [#]

An increasing number of NLP projects make use of taxonomic data structures and of ontologies. The use of NLP techniques for (semi-)automatically generating Semantic Web meta-data is also a growing trend. The advancements in the Semantic Web research area have led to a variety of standards for representing ontologies and an increasing number of tools and programming libraries for managing ontologies are becoming available. All this underlines the need for NLP systems to access ontological information and has led to the addition of support for ontologies in GATE.

The various ontology representation formalisms (such as RDF-Schema [Lassila & Swick 99], OWL and its variants [Dean et al. 04], DAML-OIL [Horrocks & vanHarmelen 01]) have their advantages and disadvantages as well as their idiosyncrasies. Rather than attempting to choose one of the formalisms based on what can only be subjective criteria and running the risk of obsolescence when that particular formalism falls out of grace with the research community or is superseded by a newer one, the GATE ontology support is aiming at providing an abstraction layer between the actual representation mechanism and the NLP modules making use of it. It consists of an in-memory data model for ontologies, an API providing access to that representation, a visual resource displaying the information, and input/output capabilities for accessing files containing ontologies using various standards. This approach has well-proved benefits, because it enables each application to use this format-independent model when dealing with ontologies, thus making the application immune to changes in the underlining ontology formats. If a new format needs to be supported, the application can automatically start using ontologies in this format, by simply including the correct tool that converts the format into the common model. From a language engineer’s perspective the advantage is that they only need to learn one API and model, rather than having to deal with many different ontology formats. This approach is similar to the way we deal with document formats.

10.1 Data Model for Ontologies [#]

In order to work as an abstraction layer, the GATE ontology implementation supports only those features common to all formalisms, which are also the features most widely used. All the information that is specific to a given representation model and cannot be represented in GATE is ignored. Currently, the ontology data model has support for hierarchies of classes and restrictions, hierarchies of properties and instances (also known as individuals).

10.1.1 Hierarchies of classes and restrictions

The central role in the ontology data model is played by the class hierarchy (or taxonomy). This consists of a set of classes linked by subClassOf, superClassOf and equivalentClassAs relations. Each ontology class has a name and a URI; in most cases the name is the local part of the URI, though this is not enforced. The URIs of all the resources in a given ontology must be unique.

Each class can have a set of superclasses and a set of subclasses; these are used to build the class hierarchy. The subClassOf and superClassOf relations are transitive and methods are provided by the API for calculating the transitive closure for each of these relations given a class. The transitive closure for the set of superclasses for a given class is a set containing all the superclasses of that class, as well as all the superclasses of its direct superclasses, and so on until no more are found. This calculation is finite, the upper bound being the set of all the classes in the ontology. A class that has no superclasses is called a top class. An ontology can have several top classes. Although the GATE ontology API can deal with cycles in the hierarchy graph, these can cause problems for processes using the API and probably indicate an error in the definition of the ontology. Care should be taken to avoid such situations.

A pair of classes can also have an equivalentClassAs (known as sameClassAs in Gate 3.1) relation, which indicates that the two classes are virtually the same and all their properties and instances should be shared.

Restriction is an anonymous type of class and is set on a object or datatype property to restrict some instances of the specified domain of the property to have only certain values (also known as value constraint) or certain number of values (also known as cardinality restriction) for the property. Thus for each restriction there exists atleast three tripples in the repository. One that defines resource as a restriction, another one that indicates on which property the restriction is specified and finally the third one that indicates what is the constraint set on the cardinality or value on the property. There are six types of restrictions:

  1. Cardinality Restriction
  2. MinCardinality Restriction
  3. MaxCardinality Restriction
  4. HasValue Restriction
  5. AllValuesFrom Restriction
  6. SomeValuesFrom Restriction

Please visit the OWL Reference for more detailed information on restrictions.

10.1.2 Instances

Instances are objects that belong to classes. Like classes, each instance has a name and a URI. Each instance can belong to one or more classes and can have properties with values. API methods are provided for getting all the instances of ontology, all the ones that belong to a given class and all the property values for a given instance. There is also a method to retrieve a list of classes that the instance belongs to, using either transitive or direct closure. Like classes, two instances can also have the sameInstanceAs relation, which indicates that the property values assigned to both instances should be shared and that all the properties applicable to one instance are also valid for the other. In addition, there is a differentInstanceAs relation, which declares the instances as disjoint.

10.1.3 Hierarchies of properties

The last part of the data model is made up of hierarchies of properties that can be associated with objects in the ontology. Unlike some other representation models, in GATE, properties do not ‘belong’ to classes and they are instead first-class citizens of the data model. The specification of the type of objects that properties apply to is done through the means of domains. Similarly, the types of values that a property can take are restricted through the definition of a range. A property with a domain that is an empty set can apply to instances of any type (i.e. there are no restrictions given). Like classes, properties can also have superPropertyOf, subPropertyOf and equivalentPropertyAs relations among them.

GATE defines five types (three in Gate 3.1) of properties:

  1. RDF Property (used to known as Property in Gate 3.1):

    Each property in an ontology is either an RDF property or a subtype of the RDF property. Each property has constraints on the type of values it can have for its domain and range. In case of the RDF property, it can have any ontology resource as its domain and range. In other words, it is associated to an ontology resource and has an ontology resource as value. For each property, the ontology API provides methods not only to set and get values of its domain and range, but also to check if a given value is valid for it. It also provides methods which can be used for specifying a particular property as its super or subproperty. These methods are inherited by all subtypes of the RDF Property (i.e. Annotation, Datatype and Object properties).

  2. Annotation Property (new in Gate 4):

    An annotation property is associated with an ontology resource (i.e. a class, property or instance) and can have a Literal (new in Gate 4) as value. A Literal is a Java object that can refer to the URI of any ontology resource or a string (http://www.w3.org/2001/XMLSchema#string) with the specified language or a data type (discussed below) with a compatible value. No two annotation properties can be declared as equivalent. It is also not possible to specify a domain or range for a annotation property or a super or subproperty relation between two annotation properties. Five annotation properties, predefined by OWL, are made available to user whenever a new ontology instance is created: owl:versionInfo, rdfs:label, rdfs:comment, rdfs:seeAlso and rdfs:isDefinedBy. In other words, even when user creates an empty ontology, these annotation properties are created automatically and provided to users.

  3. Datatype Property:

    A datatype properties is associated with an ontology instance and can have a Literal value that is compatible with its data type (new in Gate 4—Gate 3.1 allowed users to specify a java class as a range value, which is no longer valid in Gate 4). A data type can be one of the pre-defined data types in the GATE ontology API (as listed below).


    A set of ontology classes can be specified as a property’s domain; then the property can be associated only with the instance belonging to all of the classes specified in the domain.

  4. Object Property:

    An object properties is associated with an ontology instance and has an instance as value. A set of ontology classes can be specified as property’s domain and range. Then the property can be associated only with the instance belonging to all of the classes specified in the domain. Similarly, only the instances that belong to all the classes specified in the range can be set as values. Symmetric and Transitive properties are the two subtypes of Object properties. A symmetric property’s domain and range are the same (new in Gate 4).

All properties (except the annotation properties) can be marked as functional properties, which means that for a given instance in their domain, they can only take at most one value, i.e. they define a function in the algebraic sense. Properties inverse to functional properties are marked as inverse functional. If one likes ontology properties with algebraic relations, the semantics of these become apparent.

10.2 Ontology Event Model (new in Gate 4)

An Ontology Event Model (OEM) is implemented and incorporated into the new GATE Ontology API. Under the new OEM, events are fired when a resource is added, modified or deleted from the ontology.

An interface called OntologyModificationListener is created with five methods (see below) that need to be implemented by the listeners of ontology events.

public void resourcesRemoved(Ontology ontology, String[] resources);

This method is invoked whenever an ontology resource (a class, property or instance) is removed from the ontology. Deleting one resource can also result into the deletion of other dependent resources. For example, deleting a class should also delete all its instances (more details on how deletion works are explained later). The second parameter, an array of strings, provides a list of URIs of resources deleted from the ontology.

public void resourceAdded(Ontology ontology, OResource resource);

This method is invoked whenever a new resource is added to the ontology. The parameters provide references to the ontology and the resource being added to it.

public void ontologyRelationChanged(Ontology ontology, OResource resource1, OResource resource2, int eventType);

This method is invoked whenever a relation between two resources (e.g. OClass and OClass, RDFPRoeprty and RDFProeprty etc) is changed. Example events are the addition or removal of a subclass or a subproperty, two classes or properties being set as equivalent or different and two instances being set as same or different. The first parameter is the reference to the ontology, the next two parameters are the resources being affected and the final parameters is the event type. Please refer to the list of events specified below for different types of events.

public void resourcePropertyValueChanged(Ontology ontology, OResource resource, RDFProperty property, Object value, int eventType)

This method is invoked whenever any property value is added or removed to a resource. The first parameter provides a reference to the ontology in which the event took place. The second provides a reference to the resource affected, the third parameter provides a reference to the property for which the value is added or removed, the fourth parameter is the actual value being set on the resource and the fifth parameter identifies the type of event.

public void ontologyReset(Ontology ontology)

This method is called whenever ontology is reset. In other words when all resources of the ontology are deleted using the ontology.cleanup method.

The OConstants class defines the static constants, listed below, for various event types.

public static final int OCLASS_ADDED_EVENT;  
public static final int ANONYMOUS_CLASS_ADDED_EVENT;  
public static final int HAS_VALUE_RESTRICTION_ADDED_EVENT;  
public static final int SUB_CLASS_ADDED_EVENT;  
public static final int SUB_CLASS_REMOVED_EVENT;  
public static final int EQUIVALENT_CLASS_EVENT;  
public static final int ANNOTATION_PROPERTY_ADDED_EVENT;  
public static final int DATATYPE_PROPERTY_ADDED_EVENT;  
public static final int OBJECT_PROPERTY_ADDED_EVENT;  
public static final int TRANSTIVE_PROPERTY_ADDED_EVENT;  
public static final int SYMMETRIC_PROPERTY_ADDED_EVENT;  
public static final int DATATYPE_PROPERTY_VALUE_ADDED_EVENT;  
public static final int OBJECT_PROPERTY_VALUE_ADDED_EVENT;  
public static final int RDF_PROPERTY_VALUE_ADDED_EVENT;  
public static final int OBJECT_PROPERTY_VALUE_REMOVED_EVENT;  
public static final int RDF_PROPERTY_VALUE_REMOVED_EVENT;  
public static final int EQUIVALENT_PROPERTY_EVENT;  
public static final int OINSTANCE_ADDED_EVENT;  
public static final int DIFFERENT_INSTANCE_EVENT;  
public static final int SAME_INSTANCE_EVENT;  
public static final int RESOURCE_REMOVED_EVENT;  
public static final int SUB_PROPERTY_ADDED_EVENT;  
public static final int SUB_PROPERTY_REMOVED_EVENT;

An ontology is responsible for firing various ontology events. Object wishing to listen to the ontology events must implement the methods above and must be registered with the ontology using the following method.

addOntologyModificationListener(OntologyModificationListener oml);

The following method cancels the registration.

removeOntologyModificationListener(OntologyModificationListener oml);

10.2.1 What happens when a resource is deleted?

Resources in an ontology are connected with one other. For example, one class can be a sub or superclass of other classes. A resource can have multiple properties attached to it. Taking these various relations into account, change in one resource can affect other resources in the ontology. Below we describe what happens (in terms of what does the GATE ontology API do) when a resource is deleted.

10.3 OWLIM Ontology LR [#]

Ontologies in GATE are classified as language resources. In order to make use of the ontology implementation included in the main distribution, one needs to load the ‘Ontology_Tools’ CREOLE plug-in; this makes a new language resource ‘OWLIM Ontology’ available.

The implementation is based on the OWL In Memory (OWLIM), a high performance semantic repository developed at Ontotext (in Bulgaria) as part of the SEKT project. OWLIM is packaged as a Storage and Inference Layer (SAIL) for the Sesame RDF database. OWLIM uses the TRREE engine to perform RDFS, OWL DLP, and OWL Horst reasoning. The most expressive language supported is a combination of limited OWL Lite and unconstrained RDFS. OWLIM offers configurable reasoning support and performance. In the “standard” version of OWLIM (referred to as SwiftOWLIM) reasoning and query evaluation are performed in-memory, while a reliable persistence strategy assures data preservation, consistency and integrity.

OWLIM asks users to provide an XML configuration for the ontology they wish to load into the Sesame RDF database. In order to understand OWL statements, an ontology describing relations between the OWL constructs and the rdfs schema is imported. For example, owl:class is a subclass of the rdfs:class. This allows users to load OWL data into the sesame RDF database.

In order to load an ontology in an OWLIM repository, the user has to provide certain configuration parameters. These include the name of the repository, the URL of the ontology, the default name space, the format of the ontology (RDF/XML, N3, NTriples and Turtle), the URLs or absolute locations of the other ontologies to be imported, their respective name spaces and so on. Ontology files, based on their format, are parsed and persisted in the NTriples format.

In order to utilize the power of OWLIM and the simplicity of GATE Ontology API, GATE provides an implementation of the OWLIM Ontology. Its basic purpose is to hide all the complexities of OWLIM and Sesame and provide an easy to use API and interface to create, load, save and update ontologies. Based on certain parameters that the user provides when instantiating the ontology, a configuration file is dynamically generated to create a dummy repository in memory (unless persistence is specified).

When creating a new ontology, one can use an existing file to pre-populate it with data. If no such file is provided, an empty ontology is created. A detailed description for all the parameters that are available for new ontologies follows:

  1. defaultNameSpace is the base URI to be used for all new items that are only mentioned using their local name. This can safely be left empty, in which case, while adding new resources to the ontology, users are asked to provide name spaces for each new resource.
  2. As indicated earlier, OWLIM supports four different formats: RDF/XML, NTriples, Turtle and N3. According to the format of the ontology file, user should select one of the four URL options (rdfXmlURL, ntriplesURL, turtleURL and n3URL (not supported yet)) and provide a URL pointing to the ontology data.

Once an ontology is created, additional data can be loaded that will be merged with the existing information. This can be done by right-clicking on the ontology in the resources tree and selecting ‘Load ... data’ where “...” is one of the supported formats.

Other options available are cleaning the ontology (deleting all the information from it) and saving it to a file in one of the supported formats.

10.4 GATE’s Ontology Editor [#]

GATE’s ontology support also includes a viewer/editor that can be used to navigate an ontology and quickly inspect the information relating to any of the objects defined in it—classes and restrictions, instances and their properties. Also, resources can be deleted and new resources can be added through the viewer.


Figure 10.1: The GATE Ontology Viewer

The viewer is divided into two areas. One on the left shows separate tabs for hierarchy of classes and instances and for (new in Gate 4) hierarchy of properties. The view on right hand side shows the details pertaining of the object currently selected in the other two.

First tab on the left view displays a tree which shows all the classes and restrictions defined in the ontology. The tree can have several root nodes—one for each top class in the ontology. The same tree also shows each class’s instances. Instances that belong to several classes are shown as children of all the classes they belong to.

Second tab on the left view displays a tree of all the properties defined in the ontology. This tree can also have several root nodes—one for each top property in the ontology. The different types of properties are distinguished with different icons.

Whenever an item is selected in the tree view, the right-hand view is populated with the details that are appropriate for the selected object. For an ontology class, the details include the brief information about the resource such as the URI of the selected class, type of the selected class etc., set of direct superclasses, the set of all superclasses using the transitive closure, the set of direct subclasses, the set of all the subclasses, the set of equivalent classes, the set of applicable property types, the set of property values set on the selected class and the set of instances that belong to the selected class. For a restriction, in addition to the above information, it displays on which property the restriction is applicable to and the what type of the restriction it is.

For an instance, the details displayed include the brief information about the instance, set of direct types (the list of classes this instance is known to belong to), the set of all types this instance belongs to (through the transitive closure of the set of direct types), the set of same instances, the set of different instances and the values for all the properties that are set.

When a property is selected, different information is displayed in the right-hand view according to the property type. It includes the brief information about the property itself, set of direct superproperties, the set of all superproperties (obtained through the transitive closure), the set of direct subproperties, the set of all subproperties (obtained through the transitive closure), the set of equivalent properties, and domain and range information.

As mentioned in the description of the data model, properties are not directly linked to the classes, but rather define their domain of applicability through a set of domain restrictions. This means that the list of properties should not really be listed as a detail for class objects but only for instances. It is however quite useful to have an indication of the types of properties that could apply to instances of a given class. Because of the semantics of property domains, it is not possible to calculate precisely the list of applicable properties for a given class, but only an estimate of it. If a property for instance requires its domain instances to belong to two different classes then it cannot be known with certitude whether it is applicable to either of the two classes—it does not apply to all instances of any of those classes, but only to those instances the two classes have in common. Because of this, such properties will not be listed as applicable to any class.

The information listed in the details pane is organised in sub-lists according to the type of the items. Each sub-list can be collapsed or expanded by clicking on the little triangular button next to the title. The ontology viewer is dynamic and will update the information displayed whenever the underlying ontology is changed through the API.

When double clicked on any resource in the details table the respective resource is selected in the class or in the property tree and the selected resource’s details are shown in the details table. To change a property value, user can double click on a value of the property (second column) and the relevant window is shown where user is asked to provide a new value. Along with each property value, a button (with red X caption) is provided. If user wants to remove a property value he or she can click on the button and the property value is deleted.

A new toolbar has been added at the top of the ontology viewer, which contains the following buttons to add and delete ontology resources:

The tree components allow the user to select more than one node, but the details table on the right-hand side of the GUI only shows the details of the first selected node. The buttons in the toolbar are enabled and disabled based on users’ selection of nodes in the tree.

  1. Creating a new top class:

    A window appears which asks the user to provide details for its namespace (default name space if specified), and class name. If there is already a class with same name in ontology, the GUI shows an appropriate message.

  2. Creating a new subclass:

    A class can have multiple super classes. Therefore, selecting multiple classes in the ontology tree and then clicking on the “SC” button, automatically considers the selected classes as the super classes. The user is then asked for details for its namespace and class name.

  3. Creating a new instance:

    An instance can belong to more than one class. Therefore, selecting multiple classes in the ontology tree and then clicking on the “I” button, automatically considers the selected classes as the type of new instance. The user is then prompted to provide details such as namespace and instance name.

  4. Creating a new restriction:

    As described above, restriction is a type of an anonymous class and is specified on a property with a constraint set on either the number of values it can take or the type of value allowed for instances to have for that property. User can click on the blue “R” square button which shows a window for creating a new restriction. User can select a type of restriction, property and a value constraint for the same. Please note that restrictions are considered as anonymous classes and therefore user does not have to specify any URI for the same but restrictions are named automatically by the system.

  5. Creating a new property:

    An RDF property can have any ontology resource as its domain and range, so selecting multiple resources and then clicking on the new RDF Property icons shows a window where the selected resources in the tree are already taken as domain for the property. The user is then asked to provide information such as the namespace and the property name. Two buttons are provided to select resources for domain and range—clicking on them brings up a window with drop down box containing a list of resources that can be selected for domain or range and a list of resources selected by the user.

    • Since an annotation property cannot have any domain or range constraints, clicking on the new annotation property button brings up a dialog that asks the user for information such as the namespace and the annotation property name.
    • A datatype property can have one or more ontology classes as its domain and one of the pre-defined datatypes as its range, so selecting one or more classes and then clicking on the new Datatype property icon, brings up a window where the selected classes in the tree are already taken as the domain. The user is then asked to provide information such as the namespace and the property name. A drop down box allows users to select one of the data types from the list.
    • Object, symmetric and transitive properties can have one or more classes as their domain and range. For a symmetric property the domain and range are the same. Clicking on any of these options brings up a window where user is asked to provide information such as the namespace and the property name. The user is also given two buttons to select one or more classes as values for domain and range.
  6. Removing the selected resources:

    All the selected nodes are removed when user clicks on the “X” button. Please note that since ontology resources are related in various ways, deleting a resource can affect other resources in the ontology; for example, deleting a resource can cause other resources in the same ontology to be deleted too.

  7. Setting properties on instances/classes:

    Right-clicking on an instance brings up a menu that provides a list of properties that are inherited and applicable to its classes. Selecting a specific property from the menu allows the user to provide a value for that property. For example, if the property is an Object property, a new window appears which allows the user to select one or more instances which are compatible to the range of the selected property. The selected instances are then set as property values. For classes, all the properties (e.g. annotation and RDF properties) are listed on the menu.

  8. Setting relations among resources:

    Two or more classes, or two or more properties, can be set as equivalent; similarly two or more instances can be markes as the same. Right-clicking on a resource brings up a menu with an appropriate option (Equivalent Class for ontology classes, Same As Instance for instances and Equivalent Property for properties) which when clicked then brings up a window with a drop down box containing a list of resources that the user can select to specify them as equivalent or the same.

Ontology can be saved in different formats (rdf/xml, ntriples, n3 and turtle) using the options provided in the options menu that can be invoked by right clicking on the instance of an ontology in the GATE GUI. All the changes made to the ontology are logged and stored as an ontology feature. User can also export these changes to a file by selecting the ”Save Ontology Event Log” option from the options menu. Similarly, users can also load the exported event log and apply the changes on a different ontology by using the ”Load Ontology Event Log” option. Any change made to the ontology can be described by a set of triples either added or deleted from the repository. For example, In GATE API implementation, addition of a new instance results into addition of two statements into the repository:

2// Adding a new instance Rec1 of type Recognized 
3// Here + indicates the addition 
4+ <http://proton.semanticweb.org/2005/04/protons#Rec1> 
5        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
6        <http://proton.semanticweb.org/2005/04/protons#Recognized> 
8// Adding a label (annotation property) to the instance with value Rec Instance 
9+ <http://proton.semanticweb.org/2005/04/protons#Rec1> 
10        <http://www.w3.org/2000/01/rdf-schema#label> 
11        <Rec Instance> 
12        <http://www.w3.org/2001/XMLSchema#string>

The event log therefore contains a list of such triples, the latest change being at the bottom of the change log. Each triple consists of a subject followed by a predicate followed by an object. Below we give an illustration explaining the syntax used for recording the changes.

2// Adding a new instance Rec1 of type Recognized 
3// Here + indicates the addition 
4+ <http://proton.semanticweb.org/2005/04/protons#Rec1> 
5        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
6        <http://proton.semanticweb.org/2005/04/protons#Recognized> 
8// Adding a label (annotation property) to the instance with value Rec Instance 
9+ <http://proton.semanticweb.org/2005/04/protons#Rec1> 
10        <http://www.w3.org/2000/01/rdf-schema#label> 
11        <Rec Instance> 
12        <http://www.w3.org/2001/XMLSchema#string> 
14// Adding a new class called TrustSubClass 
15+ <http://proton.semanticweb.org/2005/04/protons#TrustSubClass> 
16        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
17        <http://www.w3.org/2002/07/owl#Class> 
19// TrustSubClass is a subClassOf the class Trusted 
20+ <http://proton.semanticweb.org/2005/04/protons#TrustSubClass> 
21        <http://www.w3.org/2000/01/rdf-schema#subClassOf> 
22        <http://proton.semanticweb.org/2005/04/protons#Trusted> 
24// Deleting a property called hasAlias and all relevant statements 
25// Here - indicates the deletion 
26// * indicates any value in place 
27- <http://proton.semanticweb.org/2005/04/protons#hasAlias> <*> <*> 
28- <*> <http://proton.semanticweb.org/2005/04/protons#hasAlias> <*> 
29- <*> <*> <http://proton.semanticweb.org/2005/04/protons#hasAlias> 
31// Deleting a label set on the instance Rec1 
32- <http://proton.semanticweb.org/2005/04/protons#Rec1> 
33        <http://www.w3.org/2000/01/rdf-schema#label> 
34        <Rec Instance> 
35        <http://www.w3.org/2001/XMLSchema#string> 
37// Reseting the entire ontology (Deleting all statements) 
38- <*> <*> <*>

10.5 Instantiating OWLIM Ontology using GATE API

The following code demonstrates how to use the GATE API to create an instance of OWLIM Ontology.

1// step 1: initialize GATE 
4// step 2: load the Ontology_Tools plugin 
5File ontoHome = new File(Gate.getPluginsHome(),"Ontology_Tools"); 
8// step 3: set the parameters 
9FeatureMap fm = Factory.newFeatureMap(); 
10fm.put("rdfXmlURL", url-of-the-ontology); 
12// step 4: finally create an instance of ontology 
13Ontology ontology = (Ontology) 
14Factory.createResource("gate.creole.ontology.owlim.OWLIMOntologyLR", fm); 
16// retrieving a list of top classes 
17Set<OClass> topClasses = ontology.getOClasses(true); 
19// for all top classes, printing their direct sub classes 
20Iterator<OClass> iter = topClasses.iterator(); 
21while(iter.hasNext()) { 
22   Set<OClass> dcs = iter.next().getSubClasses(OConstants.DIRECT_CLOSURE); 
23   for(OClass aClass : dcs) { 
24        System.out.println(aClass.getURI().toString()); 
25   } 
28// creating a new class 
29// false indicates that it is not an anonymous URI 
30URI aURI = new URI("http://sample.en/owlim#Organization", false); 
31OClass organizationClass = ontology.addOClass(aURI); 
33// creating a new Datatype property called name 
34// with domain set to Organization 
35// with datatype set to string 
36URI dURI = new URI("http://sample.en/owlim#Name", false); 
37Set<OClass> domain = new HashSet<OClass>(); 
39DatatypeProperty dp = ontology.addDatatypeProperty(dURI, domain, 
40                                        Datatype.getStringDataType()); 
42// creating a new instance of class organization called IBM 
43URI iURI = new URI("http://sample.en/owlim#IBM", false); 
44OInstance ibm = Ontology.addOInstance(iURI, organizationClass); 
46// assigning a Datatype property, name to ibm 
47ibm.addDatatypePropertyValue(dp, new Literal("IBM Corporation", 
48                                        dp.getDataType()); 
50// get all the set values of all Datatype properties on the instance ibm 
51Set<DatatypeProperty> dps = Ontology.getDatatypeProperties(); 
52for(DatatypeProperty dp : dps) { 
53 List<Literal> values = ibm.getDatatypePropertyValues(dp); 
54 System.out.println("DP : "+dp.getURI().toString()); 
55 for (Literal l : values) { 
56   System.out.println("Value : "+l.getValue()); 
57   System.out.println("Datatype : "+ l.getDataType().getXmlSchemaURI().toString()); 
58 } 
61// export data to a file in the ntriples format 
62BufferedWriter writer = new BufferedWriter(new FileWriter(someFile)); 
63String output = ontology.getOntologyData( 
64                   OConstants.ONTOLOGY_FORMAT_NTRIPLES); 

10.6 Ontology-Aware JAPE Transducer [#]

One of the GATE components that makes use of the ontology support is the JAPE transducer (see Chapter 7). Combining the power of ontologies with JAPE’s pattern matching mechanisms can ease the creation of applications.

In order to use ontologies with JAPE, one needs to load an ontology in GATE before loading the JAPE transducer. Once the ontology is known to the system, it can be set as the value for the optional ontology parameter for the JAPE grammar. Doing so alters slightly the way the matching occurs when the grammar is executed. If a transducer is ontology-aware (i.e. it has a value set for the ’ontology’ parameter) it will treat all occurrences of the feature named class differently from the other features of annotations. The values for the feature class on any type of annotation will be considered to be the names of classes belonging the ontology and the matching between two values will not be based on simple equality but rather hierarchical compatibility. For example if the ontology contains a class named ‘Politician’, which is a sub class of the class ’Person’, then a pattern of {Entity.class == ‘‘Person’’} will successfully match an annotation of type Entity with a feature class having the value “Politician”. If the JAPE transducer were not ontology-aware, such a test would fail.

This behaviour allows a larger degree of generalisation when designing a set of rules. Rules that apply several types of entities mentioned in the text can be written using the most generic class they apply to and need not be repeated for each subtype of entity. One could have rules applying to Locations without needing to know whether a particular location happens to be a country or a city.

If a domain ontology is available at the time of building an application, using it in conjunction with the JAPE transducers can significantly simplify the set of grammars that need to be written.

The ontology does not normally affect actions on the right hand side of JAPE rules, but when Java is used on the right hand side, then the ontology becomes accessible via a local variable named ontology, which may be referenced from within the right-hand-side code.

In Java code, the class feature should be referenced using the static final variable, LOOKUP_CLASS_FEATURE_NAME, that is defined in gate.creole.ANNIEConstants.

10.7 Annotating text with Ontological Information [#]

The ontology-aware JAPE transducer enables the text to be linked to classes in an ontology by means of annotations. Essentially this means that each annotation can have a class and ontology feature. To add the relevant class feature to an annotation is very easy: simply add a feature “class” with the classname as its value. To add the relevant ontology, use ontology.getURL().

Below is a sample rule which looks for a location annotation and identifies it as a “Mention” annotation with the class “Location” and the ontology loaded with the ontology-aware JAPE transducer (via the runtime parameter of the transducer).

1Rule: Location 
7// create an annotation set consisting of all the annotations for each tag 
8gate.AnnotationSet mentionSet = (gate.AnnotationSet)bindings.get(‘‘mention); 
10// create the ontology and class features 
11   FeatureMap features = Factory.newFeatureMap(); 
12   features.put(‘‘ontology, ontology.getURL()); 
13   features.put(‘‘class, ‘‘Location); 
15// create the new annotation 
16 annotations.add(mentionSet.firstNode(), mentionSet.lastNode(), Mention, 
17 features); 

10.8 Populating Ontologies [#]

Another typical application that combines the use of ontologies with NLP techniques is finding mentions of entities in text. The scenario is that one has an existing ontology and wants to use Information Extraction to populate it with instances whenever entities belonging to classes in the ontology are mentioned in the input texts.

Let us assume we have an ontology and an IE application that marks the input text with annotations of type ‘Mention’ having a feature ‘class’ specifying the class of the entity mentioned. The task we are seeking to solve is to add instances in the ontology for every Mention annotation.

The example presented here is based on a JAPE rule that uses Java code on the action side in order to access directly the ontology API:

1Rule: FindEntities 
5  //find the annotation matched by LHS 
6  //we know the annotation set returned 
7  //will always contain a single annotation 
8  Annotation mentionAnn = (Annotation) 
9    ((AnnotationSet)bindings.get("mention")). 
10    iterator().next(); 
12  //find the class of the mention 
13  String className = (String)mentionAnn.getFeatures(). 
14    get(gate.creole.ANNIEConstants.LOOKUP_CLASS_FEATURE_NAME); 
16  //find the text covered by the annotation 
17  String mentionName; 
18  try { 
19    mentionName = doc.getContent(). 
20      getContent( 
21        mentionAnn.getStartNode().getOffset(), 
22        mentionAnn.getEndNode().getOffset()). 
23      toString(); 
24  } catch (InvalidOffsetException e) { 
25    throw new GateRuntimeException(e); //this should never happen 
26  } 
28  //add the instance to the ontology 
29  //get the first class with that name 
30  gate.creole.ontology.OClass aClass = null; 
31  for (gate.creole.ontology.OResource aResource : 
32       ontology.getOResourcesByName(className)) { 
33    if (aResource instanceof gate.creole.ontology.OClass) { 
34      aClass = (gate.creole.ontology.OClass) aResource; 
35      break; 
36    } 
37  } 
38  if (aClass == null) { 
39    System.err.println("Error class \"" + className + "\" does not exist!"); 
41  } else { 
42    //check if the instance already exists 
43    //assume that the mentionName instances are unique instances 
44    gate.creole.ontology.URI uri = gate.creole.ontology.OntologyUtilities 
45      .createURI(ontology, mentionName, false); 
46    if (!ontology.containsOInstance(uri)) { 
47      // create the instance in the ontology 
48      ontology.addOInstance(uri, aClass); 
49    } 
50  } 

This will match each annotation of type Mention in the input and assign it to a label ‘mention’. That label is then used in the right hand side to find the annotation that was matched by the pattern (lines 5–10); the value for the class feature of the annotation is used to identify the ontological class name (lines 12–14); and the annotation span is used to extract the text covered in the document (lines 16–26). Once all these pieces of information are available, the addition to the ontology can be done. First the right class in the ontology is identified using the class name (lines 28–37) and then a new instance for that class is created (lines 38–50).

Beside JAPE, another tool that could play a part in this application is the Ontological Gazetteer see Section 5.2, which can be useful in bootstrapping the IE application that finds entity mentions.

The solution presented here is purely pedagogical as it does not address many issues that would be encountered in a real life application solving the same problem. For instance, it is na├»ve to assume that the name for the entity would be exactly the text found in the document. In many cases entities have several aliases – for example the same person name can be written in a variety of forms depending on whether titles, first names, or initials are used. A process of name normalisation would probably need to be employed in order to make sure that the same entity, regardless of the textual form it is mentioned in, will always be linked to the same ontology instance.

For a detailed description of the ontology API, please consult the JavaDoc documentation.

10.9 Ontology Annotation Tool [#]

The Ontology Annotation Tool (OAT) is a GATE plugin available from the Ontology Tools plugin set, which enables a user to manually annotate a text with respect to one or more ontologies. The required ontology must be selected from a pull-down list of available ontologies.

The OAT tool supports annotation with information about the ontology classes, instances and properties.

10.9.1 Viewing Annotated Texts [#]

Ontology-based annotations in the text can be viewed by selecting in the ontology tree the desired classes or instances (see Figure 10.2). By default, when a class is selected, all of its sub-classes and instances are also automatically selected and their mentions are highlighted in the text. There is an option to disable this default behaviour (see Section 10.9.4).


Figure 10.2: Viewing Ontology-Based Annotations

Figure 10.2 shows the mentions of each class and instance in a different colour. These colours can be customised by the user by clicking on the class/instance names in the ontology tree. It is also possible to expand and collapse branches of the ontology.

10.9.2 Editing Existing Annotations [#]


Figure 10.3: Editing Existing Annotations

In order to view the class/instance of a highlighted annotation in the text (e.g., United States - see Figure 10.3), hover the mouse over it and an edit dialogue will appear. It shows the current class or instance (Country in our example) and allows the user to delete it or change it. To delete an existing annotation, press the Delete button.

A class or instance can be changed by starting to type the name of the new class in the combo-box. Then it displays a list of available classes and instances, which start with the typed string. For example, if we want to change the type from Country to Location, we can type “Lo” and all classes and instances which names start with Lo will be displayed. The more characters are typed, the fewer matching classes remain in the list. As soon as one sees the desired class in the list, it is chosen by clicking on it.

It is possible to apply the changes to all occurrences of the same string and the same previous class/instance, not just to the current one. This is useful when annotating long texts. The user needs to make sure that they still check the classes and instances of annotations further down in the text, in case the same string has a different meaning (e.g., bank as a building vs. bank as a river bank).

The edit dialogue also allows correcting annotation offset boundaries. In other words, user can expand or shrink the annotation offsets’ boundaries by clicking on the relevant arrow buttons.

OAT also allows users to assign property values as annotation features to the existing class and instance annotations. In the case of class annotation, all annotation properties from the ontology are displayed in the table. In the case of instance annotations, all properties from the ontology applicable to the selected instance are shown in the table. The table also shows existing features of the selected annotation. User can then add, delete or edit any value(s) of the selected feature. In the case of a property, user is allowed to provide an arbitrary number of values. User can, by clicking on the editList button, add, remove or edit any value to the property. In case of object properties, users are only allowed to select values from a pre-selected list of values (i.e. instances which satisify the selected property’s range constraints).

10.9.3 Adding New Annotations [#]


Figure 10.4: Add New Annotation

New annotations can be added in two ways: using a dialogue (see Figure 10.4) or by selecting the text and clicking on the desired class or instance in the ontology tree.

When adding a new annotation using the dialogue, select a text and after a very short while, if the mouse is not moved, a dialogue will appear (see Figure 10.4). Start typing the name of the desired class or instance, until you see it listed in the combo-box, then select it with the mouse. This operation is the same, as in changing the class/instance of an existing annotation. One has the option of applying this choice to the current selection only or to all mentions of the selected string in the current document (Apply to All check box).

User can also create an instance from the selected text. If user checks the “create instance” checkbox prior to selecting the class, the selected text is annotated with the selected class and a new instance of the selected class (with the name equivalent to the selected text) is created (provided there isn’t any existing instance available in the ontology with that name).

10.9.4 Options [#]


Figure 10.5: Tool Options

There are several options that control the OAT behaviour (see Figure 10.5):