Log in Help
Print
Homewikijape-repository 〉 annotations.html
 

Working with annotations

Contents

1. Get information about one annotation from inside another annotation

Rule: getX
(
  {AnnotationY}
):Ytag
-->
{
  AnnotationSet YtagAS = (AnnotationSet) bindings.get("Ytag");

  // get Xtag info from within Ytag annotation
  AnnotationSet XtagAS = inputAS.get("Xtag", YtagAS.firstNode().getOffset(), YtagAS.lastNode().getOffset());


  // create new annotation              
 FeatureMap features = Factory.newFeatureMap();                  
  outputAS.add(XtagAS.firstNode(), XtagAS.lastNode(), "X", features);
}

Warning: here you get also partially contained annotations. If you want only the annotations completely contained use 'inputAS.getContained' instead of 'inputAS.get'.

2. Annotate blindly whatever is between two annotations

This rule annotates whatever is between the two separators, without knowing what it is, and without having to specify the relevant annotation types in the Input headers

Rule: GuessProduct1
Priority: 2
(
 {Date}
)
({Separator}):left
({Separator}):right
-->
{
  Node start = ((AnnotationSet) bindings.get("left"))
               .lastNode();
  Node end   = ((AnnotationSet) bindings.get("right"))
               .firstNode();

  FeatureMap features = Factory.newFeatureMap();
  features.put("rule", "GuessProduct1");

  outputAS.add(start, end, "Product", features);
}

3. Check that there is no annotation of a certain type already contained in the annotation

// check that it doesn't contain the FishClass annotation already somewhere
if (inputAS.get("FishClass", anAnn.getStartNode().getOffset(),
                      anAnn.getEndNode().getOffset() ).isEmpty() ) 

// if not, then add a new FishClass annotation to the whole thing
{annotations.add(anAnn.getStartNode(), anAnn.getEndNode(), "FishClass",
 features);}

4. Find an annotation of one type and copy modified versions of its features to all contained annotations of a specified type

Rule: ArticleMention
(
  ({Article}):article
)
-->
{
   /* Get the Article annotations and its span (to use for start and end points) */
   AnnotationSet span    = (gate.AnnotationSet) bindings.get("article");
   Annotation  article   = span.iterator().next();

   /* Get all the contained Mention annotations */
   AnnotationSet mentions = inputAS.getContained(span.firstNode().getOffset(), 
                                               span.lastNode().getOffset())
                                   .get("Mention");
   Iterator<Annotation>  mentionIter = mentions.iterator();

   FeatureMap articleFeatures  = article.getFeatures();
   Set        articleFKeys     = articleFeatures.keySet();
   Annotation mention;
   FeatureMap mentionFeatures, additionalFeatures;
   Iterator   keyIter;
   Object     key, value;
   String     mKey;

   /* Produce a modified FeatureMap from that of the Article */
   additionalFeatures  = gate.Factory.newFeatureMap();
   keyIter             = articleFKeys.iterator();
   while(keyIter.hasNext()) {
       key = keyIter.next();

       /* ignore non-String keys; we don't expect them and wouldn't know
          what to do with them */
       if (key instanceof String) {
          value = articleFeatures.get(key);
          mKey     = "article_" + ((String) key);
          additionalFeatures.put(mKey, value);
       }
    }

   /* Iterate through the Mentions and copy modified versions of the Article's features
      into each Mention's feature map */
   while (mentionIter.hasNext()) {
      mention          = mentionIter.next();
      mentionFeatures  = mention.getFeatures();
      mentionFeatures.putAll(additionalFeatures);
   }
}

5. Iterate through the Tokens contained in an annotation

Rule: TokenCount
(
  {Comment}
)
:ann
-->
{
  AnnotationSet commentAs = (gate.AnnotationSet)bindings.get("ann");
  AnnotationSet commentTokensAs = inputAS.get("Token").getContained(
    commentAs.firstNode().getOffset(),
    commentAs.lastNode().getOffset());

  for(Annotation commentTokenAnn : commentTokensAs)
  {
    ........          
  }   
}

6. Annotate NPs within a list of NPs

% Iterate through a set of items within an annotation and annotate each one with the same information.

Rule: List

(
{NP}
((AND) {NP})*
):mention

—>
{
   //get the mention annotations in a list
   List annList = new ArrayList((AnnotationSet)bindings.get("mention"));


   //sort the list by offset
   Collections.sort(annList, new OffsetComparator());

   //iterate through the matched annotations
   for(int i = 0; i < annList.size(); i++)
   {
      Annotation anAnn = (Annotation)annList.get(i);

      // check that the new annotation is an NP
      if ((anAnn.getType().equals("NP"))

      {
         FeatureMap features = Factory.newFeatureMap();
         features.put("rule", "List1");

         annotations.add(anAnn.getStartNode(), anAnn.getEndNode(), "SomeTag",
         features);

      }
   }
}

7. Rename annotations

Here is the example that renames 'Lookup' to 'Ontores' annotations and copy all features from one to the other.

phase: OntoResource
Input: Lookup
options: control = all

Rule:    createOntoResFromLookup
({Lookup}):lookup
-->
{
 gate.AnnotationSet lookup = (gate.AnnotationSet) bindings.get("lookup");
 gate.Annotation ann = (gate.Annotation) lookup.iterator().next();
 FeatureMap lookupFeatures = ann.getFeatures();
 gate.FeatureMap features = Factory.newFeatureMap();
 features.putAll(lookupFeatures);
 features.remove("majorType");
 try{
   outputAS.add(lookup.firstNode().getOffset(),
      lookup.lastNode().getOffset(),
     "OntoRes", features);
 }catch(InvalidOffsetException e){
   throw new LuckyException(e);
 }
 //remove old lookup
 inputAS.remove(ann);
}

8. Only create an annotation if the pattern is a certain number of characters in length

({Some pattern to be matched}):tag
-->
{
AnnotationSet tagSet = (AnnotationSet) bindings.get("tag");

// get the offsets
int length = tagSet.lastNode().getOffset() - tagSet.firstNode().getOffset();

FeatureMap features = Factory.newFeatureMap();

// check if the annotation set is less than or equal to 4 characters
if (length > 5)  {

//create new features
features.put("rule", "RuleName");

// create new annotation                                   
  outputAS.add(tagSet.firstNode(), tagSet.lastNode(), "NewAnnotation", features);
   }
}

9. Split on a separator one annotation into several annotations

Full grammar taken from the French application with TreeTagger.

Imports: {
import static gate.Utils.*;
}

Phase: postprocess
Input: Token SpaceToken
Options: control = appelt

Rule: simpleSplit
/* split compound word, to make it the same as the
TreeTagger output, e.g. apprend-on should be two Tokens not one */

(
  {Token.kind == word, Token.string =~ "[^-]+(-[^-]+){1,2}"}
):match
-->
{
  AnnotationSet set = bindings.get("match");
  Annotation annotation = set.iterator().next();
  String content = stringFor(doc, annotation);
  long offset = start(annotation);
  long endOffset = end(annotation);
  try {
    FeatureMap features;
    int startIndex = 0;
    int dashIndex = 0;
    while ((dashIndex = content.indexOf('-', startIndex)) != -1) {
     features = Factory.newFeatureMap();
     features.putAll(annotation.getFeatures());
     features.put("string", content.substring(startIndex, dashIndex));
     features.put("length", dashIndex-startIndex);
     outputAS.add(offset, offset+dashIndex, "Token", features);
     features = Factory.newFeatureMap();
     features.putAll(annotation.getFeatures());
     features.put("string", "-");
     features.put("length", 1);
     outputAS.add(offset+dashIndex, offset+dashIndex+1, "Token", features);
     offset += dashIndex;
     startIndex = dashIndex + 1;
    }
   features = Factory.newFeatureMap();
   features.putAll(annotation.getFeatures());
   features.put("string", content.substring(startIndex));
   features.put("length", content.length()-startIndex);
   outputAS.add(offset+1, endOffset, "Token", features);
  } catch (InvalidOffsetException e) {
    throw new LuckyException(e);
  }
  outputAS.remove(annotation);
}

10. Join several annotations in one annotation

Full grammar taken from the old French application with TreeTagger.

Phase: postprocess
Input: Token SpaceToken
Options: control = appelt

Rule: simpleJoin
/* joins a final apostrophe with the preceding word, to make it the same as the
TreeTagger output, e.g. d' should be one Token not two */

 (
  (
   {Token.string == "d"}|
   {Token.string == "D"}|
   {Token.string == "L"}|
   {Token.string == "l"}|
   {Token.string == "n"}|
   {Token.string == "N"}
  )
  {Token.string == "'"}
 ):left
-->
{
  AnnotationSet toRemove = bindings.get("left");
  outputAS.removeAll(toRemove);
  //get the tokens
  ArrayList tokens = new ArrayList(toRemove);
  //define a comparator for annotations by start offset
  Collections.sort(tokens, new OffsetComparator());
  String text = "";
  Iterator tokIter = tokens.iterator();
  while(tokIter.hasNext())
    text += (String)((Annotation)tokIter.next()).getFeatures().get("string");

  FeatureMap features = Factory.newFeatureMap();
  features.put("kind", "word");
  features.put("string", text);
  features.put("length", Integer.toString(text.length()));
  features.put("orth", "artapos");
  outputAS.add(toRemove.firstNode(), toRemove.lastNode(), "Token", features);
}

11. Get the value of the last annotation of a specified type that's contained inside an annotation of a different type

In this case, we want to get the value of the root feature of the Verb annotation contained in the VG annotation. There may be more than one Verb annotation, in which case we want to take the last one in the sequence. Finally, we want to add the value of that root feature to the VG annotation.

Rule: VG
({VG}):tag
-->
:tag {
  AnnotationSet verbs = gate.Utils.getContainedAnnotations(inputAS, tagAnnots, "Verb");
  List<Annotation> verbList = gate.Utils.inDocumentOrder(verbs);
  Annotation lastVerb = verbList.get(verbList.size() - 1);
  String verbRoot = lastVerb.getFeatures().get("root").toString();
  Annotation vg = tagAnnots.iterator().next();
  vg.getFeatures().put("root", verbRoot);
}

12. End the phase after the current rule RHS has finished

({Token.category="NN"}) :ann
-->
:ann.ann = {},
{ ctx.endPhase(); }