JAPE String Matching
Contents
- 1. Check that the first two digits of a number match a specified string
- 2. Convert a string to an integer
- 3. Using a regular expression on content of selected annotation to add new annotation
- 4. Using meta-properties to get the string of an annotation
- 5. Using meta-properties to get the string of all annotations bound by the match
1. Check that the first two digits of a number match a specified string
Note that this could be generalised in many ways, e.g. match the first 3 letters of a word. In the example below, we're trying to check if the first two digits are "07".
Rule:GetMobile
(
{Phone}
):tag
-->
:tag{
// get the offsets
Long phoneStart = tagAnnots.firstNode().getOffset();
Long phoneEnd = tagAnnots.lastNode().getOffset();
// check the number is longer than or equal to 2 characters (just in case)
if(phoneEnd - phoneStart >= 2) {
try {
String firstTwoChars = doc.getContent()
.getContent(tagAnnots.firstNode().getOffset(),
tagAnnots.firstNode().getOffset() + 2).toString();
// check it matches 07
if("07".equals(firstTwoChars)) {
// create the new annotation
gate.FeatureMap features = Factory.newFeatureMap();
features.put("kind", "mobile");
outputAS.add(tagAS.firstNode(),
tagAS.lastNode(), "Phone", features);
}
}
catch(InvalidOffsetException e) {
// not possible
throw new LuckyException("Invalid offset from annotation");
}
}
}
2. Convert a string to an integer
int x = Integer.parseInt(string);
3. Using a regular expression on content of selected annotation to add new annotation
Rule: ExtractAuthor
(
{Reference.type == "Literature"}
|{Reference.type == "Patent"}
):reference
-->
{
AnnotationSet set = (AnnotationSet)bindings.get("reference");
Annotation ann = set.iterator().next();
FeatureMap fm = (FeatureMap)
((SimpleFeatureMapImpl)ann.getFeatures()).clone();
fm.put("postprocessing.rule", "reference-extract-author.ExtractAuthor");
try {
String text = doc.getContent().getContent(
set.firstNode().getOffset(), set.lastNode().getOffset()).toString();
text = text.replaceAll("\\s", " "); // replace new line with space
String lastName =
"\\b" // beginning of a word
+"(?:\\p{Ll}{0,3} )?" // particle ?
+"\\p{Lu}[\\p{L}-]{1,13}" // Name
+"(?: \\p{Ll}{0,3})?"; // particle ?
String initials = "(?: \\p{Lu}\\.){1,3}";
java.util.regex.Matcher matcher = java.util.regex.Pattern.compile(
lastName+"(:?(?:,"+initials+")|(?:,? and "+lastName+")|(?:,? et al\\.?))"
).matcher(text);
while (matcher.find()) {
outputAS.add(set.firstNode().getOffset()+matcher.start(),
set.firstNode().getOffset()+matcher.end(),
"Author", fm);
}
} catch(InvalidOffsetException e) {
throw new GateRuntimeException(e);
}
}
4. Using meta-properties to get the string of an annotation
{X}:label
-->
:label.New = {somefeat = :label.X@string }
5. Using meta-properties to get the string of all annotations bound by the match
(
{X}
({Y}+):ys
):label
-->
:label.New = { somefeat = :ys@string }




