Log in Help
Print
Homewiki 〉 groovy-recipes
 

Groovy recipes

1. Filter by document feature

This one is in the user guide

Factory.newCorpus("fredsDocs").addAll( 
  docs.findAll{ 
    it.features.annotator == "fred"
  } 
)

2. Filter if annotation sets exist - e.g. double annotated

Useful for separating out all double annotated docs from a corpus

Factory.newCorpus("doubleDocs").addAll( 
  docs.findAll{ 
    (it.annotationSetNames.contains("annotator1")
    && it.annotationSetNames.contains("annotator2"))
  }
)

3. Choose an app to execute

You can already conditionally execute PRs based on a document feature. By placing two pipelines as PRs in a third conditional pipeline, you can extend this to execute a pipeline based on a document feature. This Groovy script goes one further, and chooses a pipeline to execute based on some other aspect of the document - in this case, the existence of a particular annotation set.

app1 = apps.find{it.name.equals("app1")}
app2 = apps.find{it.name.equals("app2")}
Factory.newCorpus("tempCorpus").withResource { tempCorpus ->
  docs.findAll{
    app = (it.annotationSetNames.contains("annotator1")) ? app1 : app2
    tempCorpus.add(it)
    app.setCorpus(tempCorpus)
    app.execute()
    tempCorpus.clear()
  }
}
println "done"

4. How many annotations?

sum = 0
docs.findAll{
  def filteredAnnots = it.getAnnotations("Filtered")
  num = filteredAnnots["Anatomy"].size()
  sum += num
  println it.name + " " + num
}
println "total:" + " " + sum

5. Rename annotations

This one is for the Groovy PR, but could easily be adapted to the console using the above ideas. You could also parameterise it if needed - see the user guide for details.

inputAS.findAll{
  it.type == "OldName"
}.each{
  outputAS.add(it.start(), it.end(),
               "NewName",
               it.features.toFeatureMap()) // clone the feature map
}.each{
  inputAS.remove(it)
}