JAPE String Matching
Contents
- 1. Check that the first two digits of a number match a specified string
- 2. Convert a string to an integer
- 3. Using a regular expression on content of selected annotation to add new annotation
- 4. Using meta-properties to get the string of an annotation
- 5. Using meta-properties to get the string of all annotations bound by the match
1. Check that the first two digits of a number match a specified string
Note that this could be generalised in many ways, e.g. match the first 3 letters of a word. In the example below, we're trying to check if the first two digits are "07".
Rule:GetMobile ( {Phone} ):tag --> :tag{ // get the offsets Long phoneStart = tagAnnots.firstNode().getOffset(); Long phoneEnd = tagAnnots.lastNode().getOffset(); // check the number is longer than or equal to 2 characters (just in case) if(phoneEnd - phoneStart >= 2) { try { String firstTwoChars = doc.getContent() .getContent(tagAnnots.firstNode().getOffset(), tagAnnots.firstNode().getOffset() + 2).toString(); // check it matches 07 if("07".equals(firstTwoChars)) { // create the new annotation gate.FeatureMap features = Factory.newFeatureMap(); features.put("kind", "mobile"); outputAS.add(tagAS.firstNode(), tagAS.lastNode(), "Phone", features); } } catch(InvalidOffsetException e) { // not possible throw new LuckyException("Invalid offset from annotation"); } } }
2. Convert a string to an integer
int x = Integer.parseInt(string);
3. Using a regular expression on content of selected annotation to add new annotation
Rule: ExtractAuthor ( {Reference.type == "Literature"} |{Reference.type == "Patent"} ):reference --> { AnnotationSet set = (AnnotationSet)bindings.get("reference"); Annotation ann = set.iterator().next(); FeatureMap fm = (FeatureMap) ((SimpleFeatureMapImpl)ann.getFeatures()).clone(); fm.put("postprocessing.rule", "reference-extract-author.ExtractAuthor"); try { String text = doc.getContent().getContent( set.firstNode().getOffset(), set.lastNode().getOffset()).toString(); text = text.replaceAll("\\s", " "); // replace new line with space String lastName = "\\b" // beginning of a word +"(?:\\p{Ll}{0,3} )?" // particle ? +"\\p{Lu}[\\p{L}-]{1,13}" // Name +"(?: \\p{Ll}{0,3})?"; // particle ? String initials = "(?: \\p{Lu}\\.){1,3}"; java.util.regex.Matcher matcher = java.util.regex.Pattern.compile( lastName+"(:?(?:,"+initials+")|(?:,? and "+lastName+")|(?:,? et al\\.?))" ).matcher(text); while (matcher.find()) { outputAS.add(set.firstNode().getOffset()+matcher.start(), set.firstNode().getOffset()+matcher.end(), "Author", fm); } } catch(InvalidOffsetException e) { throw new GateRuntimeException(e); } }
4. Using meta-properties to get the string of an annotation
{X}:label --> :label.New = {somefeat = :label.X@string }
5. Using meta-properties to get the string of all annotations bound by the match
( {X} ({Y}+):ys ):label --> :label.New = { somefeat = :ys@string }