Log in Help
Print
Homewiki 〉 groovy-recipes
 

Groovy recipes

1. Filter by document feature

This one is in the user guide

factory.newCorpus("fredsDocs").addAll( 
  docs.findAll{ 
    it.getFeatures().get("annotator").equals("fred") 
  } 
)

2. Filter if annotation sets exist - e.g. double annotated

Useful for separating out all double annotated docs from a corpus

factory.newCorpus("doubleDocs").addAll( 
  docs.findAll{ 
    (it.getAnnotationSetNames().contains("annotator1")
    && it.getAnnotationSetNames().contains("annotator2"))
  }
)

3. Choose an app to execute

You can already conditionally execute PRs based on a document feature. By placing two pipelines as PRs in a third conditional pipeline, you can extend this to execute a pipeline based on a document feature. This Groovy script goes one further, and chooses a pipeline to execute based on some other aspect of the document - in this case, the existence of a particular annotation set.

app1 = apps.find{it.name.equals("app1")}
app2 = apps.find{it.name.equals("app2")}
tempCorpus = factory.newCorpus("tempCorpus")
docs.findAll{
  app = (it.getAnnotationSetNames().contains("annotator1")) ? app1 : app2
  tempCorpus.add(it)
  app.setCorpus(tempCorpus)
  app.execute()
  tempCorpus.clear()
}
factory.deleteResource(tempCorpus)
println "done"

4. How many annotations?

sum = 0
docs.findAll{
  num = it.getAnnotations("Filtered").get("Anatomy").size()
  sum += num
  println it.getName() + " " + num
}
println "total:" + " " + sum

5. Rename annotations

This one is for the Groovy PR, but could easily be adapted to the console using the above ideas. You could also parameterise it if needed - see the user guide for details.

inputAS.findAll{
  it.getType() == "OldName"
}.each{
  outputAS.add(it.getStartNode().getOffset(),
               it.getEndNode().getOffset(),
               "NewName",
               it.getFeatures())    
}.each{
  inputAS.remove(it)
}