Working with annotations
Contents
- 1. Get information about one annotation from inside another annotation
- 2. Annotate blindly whatever is between two annotations
- 3. Check that there is no annotation of a certain type already contained in the annotation
- 4. Find an annotation of one type and copy modified versions of its features to all contained annotations of a specified type
- 5. Iterate through the Tokens contained in an annotation
- 6. Annotate NPs within a list of NPs
- 7. Rename annotations
- 8. Only create an annotation if the pattern is a certain number of characters in length
- 9. Split on a separator one annotation into several annotations
- 10. Join several annotations in one annotation
- 11. Get the value of the last annotation of a specified type that's contained inside an annotation of a different type
- 12. End the phase after the current rule RHS has finished
- 13. Get the first Token from a Sentence annotation
- 14. Create a Unique Annotation for Each Mention of an Annotation
1. Get information about one annotation from inside another annotation
Rule: getX ( {AnnotationY} ):Ytag --> { AnnotationSet YtagAS = (AnnotationSet) bindings.get("Ytag"); // get Xtag info from within Ytag annotation AnnotationSet XtagAS = inputAS.get("Xtag", YtagAS.firstNode().getOffset(), YtagAS.lastNode().getOffset()); // create new annotation FeatureMap features = Factory.newFeatureMap(); outputAS.add(XtagAS.firstNode(), XtagAS.lastNode(), "X", features); }
Warning: here you get also partially contained annotations. If you want only the annotations completely contained use 'inputAS.getContained' instead of 'inputAS.get'.
2. Annotate blindly whatever is between two annotations
This rule annotates whatever is between the two separators, without knowing what it is, and without having to specify the relevant annotation types in the Input headers
Rule: GuessProduct1 Priority: 2 ( {Date} ) ({Separator}):left ({Separator}):right --> { Node start = ((AnnotationSet) bindings.get("left")) .lastNode(); Node end = ((AnnotationSet) bindings.get("right")) .firstNode(); FeatureMap features = Factory.newFeatureMap(); features.put("rule", "GuessProduct1"); outputAS.add(start, end, "Product", features); }
3. Check that there is no annotation of a certain type already contained in the annotation
// check that it doesn't contain the FishClass annotation already somewhere if (inputAS.get("FishClass", anAnn.getStartNode().getOffset(), anAnn.getEndNode().getOffset() ).isEmpty() ) // if not, then add a new FishClass annotation to the whole thing {annotations.add(anAnn.getStartNode(), anAnn.getEndNode(), "FishClass", features);}
4. Find an annotation of one type and copy modified versions of its features to all contained annotations of a specified type
Rule: ArticleMention ( ({Article}):article ) --> { /* Get the Article annotations and its span (to use for start and end points) */ AnnotationSet span = (gate.AnnotationSet) bindings.get("article"); Annotation article = span.iterator().next(); /* Get all the contained Mention annotations */ AnnotationSet mentions = inputAS.getContained(span.firstNode().getOffset(), span.lastNode().getOffset()) .get("Mention"); Iterator<Annotation> mentionIter = mentions.iterator(); FeatureMap articleFeatures = article.getFeatures(); Set articleFKeys = articleFeatures.keySet(); Annotation mention; FeatureMap mentionFeatures, additionalFeatures; Iterator keyIter; Object key, value; String mKey; /* Produce a modified FeatureMap from that of the Article */ additionalFeatures = gate.Factory.newFeatureMap(); keyIter = articleFKeys.iterator(); while(keyIter.hasNext()) { key = keyIter.next(); /* ignore non-String keys; we don't expect them and wouldn't know what to do with them */ if (key instanceof String) { value = articleFeatures.get(key); mKey = "article_" + ((String) key); additionalFeatures.put(mKey, value); } } /* Iterate through the Mentions and copy modified versions of the Article's features into each Mention's feature map */ while (mentionIter.hasNext()) { mention = mentionIter.next(); mentionFeatures = mention.getFeatures(); mentionFeatures.putAll(additionalFeatures); } }
5. Iterate through the Tokens contained in an annotation
Rule: TokenCount ( {Comment} ) :ann --> { AnnotationSet commentAs = (gate.AnnotationSet)bindings.get("ann"); AnnotationSet commentTokensAs = inputAS.get("Token").getContained( commentAs.firstNode().getOffset(), commentAs.lastNode().getOffset()); for(Annotation commentTokenAnn : commentTokensAs) { ........ } }
6. Annotate NPs within a list of NPs
% Iterate through a set of items within an annotation and annotate each one with the same information.
Rule: List ( {NP} ((AND) {NP})* ):mention —> { //get the mention annotations in a list List annList = new ArrayList((AnnotationSet)bindings.get("mention")); //sort the list by offset Collections.sort(annList, new OffsetComparator()); //iterate through the matched annotations for(int i = 0; i < annList.size(); i++) { Annotation anAnn = (Annotation)annList.get(i); // check that the new annotation is an NP if ((anAnn.getType().equals("NP")) { FeatureMap features = Factory.newFeatureMap(); features.put("rule", "List1"); annotations.add(anAnn.getStartNode(), anAnn.getEndNode(), "SomeTag", features); } } }
7. Rename annotations
Here is the example that renames 'Lookup' to 'Ontores' annotations and copy all features from one to the other.
phase: OntoResource Input: Lookup options: control = all Rule: createOntoResFromLookup ({Lookup}):lookup --> { gate.AnnotationSet lookup = (gate.AnnotationSet) bindings.get("lookup"); gate.Annotation ann = (gate.Annotation) lookup.iterator().next(); FeatureMap lookupFeatures = ann.getFeatures(); gate.FeatureMap features = Factory.newFeatureMap(); features.putAll(lookupFeatures); features.remove("majorType"); try{ outputAS.add(lookup.firstNode().getOffset(), lookup.lastNode().getOffset(), "OntoRes", features); }catch(InvalidOffsetException e){ throw new LuckyException(e); } //remove old lookup inputAS.remove(ann); }
8. Only create an annotation if the pattern is a certain number of characters in length
({Some pattern to be matched}):tag --> { AnnotationSet tagSet = (AnnotationSet) bindings.get("tag"); // get the offsets int length = tagSet.lastNode().getOffset() - tagSet.firstNode().getOffset(); FeatureMap features = Factory.newFeatureMap(); // check if the annotation set is less than or equal to 4 characters if (length > 5) { //create new features features.put("rule", "RuleName"); // create new annotation outputAS.add(tagSet.firstNode(), tagSet.lastNode(), "NewAnnotation", features); } }
9. Split on a separator one annotation into several annotations
Full grammar taken from the French application with TreeTagger.
Imports: { import static gate.Utils.*; } Phase: postprocess Input: Token SpaceToken Options: control = appelt Rule: simpleSplit /* split compound word, to make it the same as the TreeTagger output, e.g. apprend-on should be two Tokens not one */ ( {Token.kind == word, Token.string =~ "[^-]+(-[^-]+){1,2}"} ):match --> { AnnotationSet set = bindings.get("match"); Annotation annotation = set.iterator().next(); String content = stringFor(doc, annotation); long offset = start(annotation); long endOffset = end(annotation); try { FeatureMap features; int startIndex = 0; int dashIndex = 0; while ((dashIndex = content.indexOf('-', startIndex)) != -1) { features = Factory.newFeatureMap(); features.putAll(annotation.getFeatures()); features.put("string", content.substring(startIndex, dashIndex)); features.put("length", dashIndex-startIndex); outputAS.add(offset, offset+dashIndex, "Token", features); features = Factory.newFeatureMap(); features.putAll(annotation.getFeatures()); features.put("string", "-"); features.put("length", 1); outputAS.add(offset+dashIndex, offset+dashIndex+1, "Token", features); offset += dashIndex; startIndex = dashIndex + 1; } features = Factory.newFeatureMap(); features.putAll(annotation.getFeatures()); features.put("string", content.substring(startIndex)); features.put("length", content.length()-startIndex); outputAS.add(offset+1, endOffset, "Token", features); } catch (InvalidOffsetException e) { throw new LuckyException(e); } outputAS.remove(annotation); }
10. Join several annotations in one annotation
Full grammar taken from the old French application with TreeTagger.
Phase: postprocess Input: Token SpaceToken Options: control = appelt Rule: simpleJoin /* joins a final apostrophe with the preceding word, to make it the same as the TreeTagger output, e.g. d' should be one Token not two */ ( ( {Token.string == "d"}| {Token.string == "D"}| {Token.string == "L"}| {Token.string == "l"}| {Token.string == "n"}| {Token.string == "N"} ) {Token.string == "'"} ):left --> { AnnotationSet toRemove = bindings.get("left"); outputAS.removeAll(toRemove); //get the tokens ArrayList tokens = new ArrayList(toRemove); //define a comparator for annotations by start offset Collections.sort(tokens, new OffsetComparator()); String text = ""; Iterator tokIter = tokens.iterator(); while(tokIter.hasNext()) text += (String)((Annotation)tokIter.next()).getFeatures().get("string"); FeatureMap features = Factory.newFeatureMap(); features.put("kind", "word"); features.put("string", text); features.put("length", Integer.toString(text.length())); features.put("orth", "artapos"); outputAS.add(toRemove.firstNode(), toRemove.lastNode(), "Token", features); }
11. Get the value of the last annotation of a specified type that's contained inside an annotation of a different type
In this case, we want to get the value of the root feature of the Verb annotation contained in the VG annotation. There may be more than one Verb annotation, in which case we want to take the last one in the sequence. Finally, we want to add the value of that root feature to the VG annotation.
Rule: VG ({VG}):tag --> :tag { AnnotationSet verbs = gate.Utils.getContainedAnnotations(inputAS, tagAnnots, "Verb"); List<Annotation> verbList = gate.Utils.inDocumentOrder(verbs); Annotation lastVerb = verbList.get(verbList.size() - 1); String verbRoot = lastVerb.getFeatures().get("root").toString(); Annotation vg = tagAnnots.iterator().next(); vg.getFeatures().put("root", verbRoot); }
12. End the phase after the current rule RHS has finished
({Token.category="NN"}) :ann --> :ann.ann = {}, { ctx.endPhase(); }
13. Get the first Token from a Sentence annotation
Rule: FirstTokenInSentence
({Sentence}):tag --> :tag { AnnotationSet tokens = gate.Utils.getContainedAnnotations(inputAS, tagAnnots, "Token"); List<Annotation> tokensOrdered = gate.Utils.inDocumentOrder(tokens); Annotation firstToken = tokensOrdered.get(0); FeatureMap features = Factory.newFeatureMap(); gate.Utils.addAnn(outputAS, firstToken, "FirstToken", features); }
14. Create a Unique Annotation for Each Mention of an Annotation
Imports: { import static gate.Utils.*; } Phase: NumberExamples Input: Example Options: control = once Rule: Examples ({Example}):ex --> { int number = 0; for(Annotation a : inDocumentOrder(inputAS.get("Example"))) { addAnn(outputAS, a, "Example_" + (++number), a.getFeatures()) } }