Log in Help
Print
Homewikijape-repository 〉 strings.html
 

JAPE String Matching

Contents

1. Check that the first two digits of a number match a specified string

Note that this could be generalised in many ways, e.g. match the first 3 letters of a word. In the example below, we're trying to check if the first two digits are "07".

Rule:GetMobile
(
 {Phone}
):tag
-->
:tag{

// get the offsets
 Long phoneStart = tagAnnots.firstNode().getOffset();
 Long phoneEnd = tagAnnots.lastNode().getOffset();
 
// check the number is longer than or equal to 2 characters (just in case)
 if(phoneEnd - phoneStart >= 2) {
   try {
     String firstTwoChars = doc.getContent()
         .getContent(tagAnnots.firstNode().getOffset(),
                     tagAnnots.firstNode().getOffset() + 2).toString();

// check it matches 07
     if("07".equals(firstTwoChars)) {
       // create the new annotation

    gate.FeatureMap features = Factory.newFeatureMap();
    features.put("kind", "mobile");
    outputAS.add(tagAS.firstNode(),
                           tagAS.lastNode(), "Phone", features);
     }
   }
   catch(InvalidOffsetException e) {
     // not possible
     throw new LuckyException("Invalid offset from annotation");
   }
 }
}

2. Convert a string to an integer

int x = Integer.parseInt(string);

3. Using a regular expression on content of selected annotation to add new annotation

Rule: ExtractAuthor
(
  {Reference.type == "Literature"}
  |{Reference.type == "Patent"}
):reference
-->
{
  AnnotationSet set = (AnnotationSet)bindings.get("reference");
  Annotation ann = set.iterator().next();
  FeatureMap fm = (FeatureMap)
    ((SimpleFeatureMapImpl)ann.getFeatures()).clone();
  fm.put("postprocessing.rule", "reference-extract-author.ExtractAuthor");
  try {
  String text = doc.getContent().getContent(
    set.firstNode().getOffset(), set.lastNode().getOffset()).toString();
  text = text.replaceAll("\\s", " "); // replace new line with space
  String lastName =
     "\\b" // beginning of a word
    +"(?:\\p{Ll}{0,3} )?" // particle ?
    +"\\p{Lu}[\\p{L}-]{1,13}" // Name
    +"(?: \\p{Ll}{0,3})?"; // particle ?
  String initials = "(?: \\p{Lu}\\.){1,3}";
  java.util.regex.Matcher matcher = java.util.regex.Pattern.compile(
    lastName+"(:?(?:,"+initials+")|(?:,? and "+lastName+")|(?:,? et al\\.?))"
    ).matcher(text);
  while (matcher.find()) {
    outputAS.add(set.firstNode().getOffset()+matcher.start(),
                 set.firstNode().getOffset()+matcher.end(),
                 "Author", fm);
  }
  } catch(InvalidOffsetException e) {
      throw new GateRuntimeException(e);
  }
}

4. Using meta-properties to get the string of an annotation

{X}:label
-->
:label.New = {somefeat = :label.X@string } 

5. Using meta-properties to get the string of all annotations bound by the match

(  
  {X}  
  ({Y}+):ys  
):label  
-->  
:label.New = { somefeat = :ys@string }