The 2 Minute Guide to Helping People Find Stuff with GATE
- Take one large pile of text (documents, emails, tweets, patents, papers,
transcripts, blogs, comments, acts of parliament, and so on and so forth) --
call this your corpus.
- Pick a structured description of interesting things in the text (a telephone
directory, or chemical taxonomy, or something from the
Linked Data cloud) -- call this your ontology.
- Use GATE Teamware to mark up a gold standard example set of
annotations of the corpus (1.) relative to the ontology (2.).
- Use GATE Developer to build a semantic annotation
pipeline to do the annotation job automatically and measure performance
against the gold standard.
- Take the pipeline from 4. and apply it to your text pile using
GATE Cloud (or embed it in your own systems using
GATE Embedded).
- Use GATE Mimir to store the annotations relative to the ontology in a
multiparadigm index server. (For techies: this sits in the backroom as a
RESTful web service.)
- Use Ontotext KIM to add semantic search,
knowledge facet search, ontology browsing, entity popularity graphing, time
series graphing, annotation structure search and (last but not least)
boolean full text search. (More techy stuff: mash up these types of search
with your existing UIs.)
Hey presto, you have state-of-the-art information management applying your
ontology to your corpus... But your users don't care. They're just happy
because now they can find stuff.