The 2 Minute Guide to Helping People Find Stuff with GATE
- Take one large pile of text (documents, emails, tweets, patents, papers, transcripts, blogs, comments, acts of parliament, and so on and so forth) -- call this your corpus.
- Pick a structured description of interesting things in the text (a telephone directory, or chemical taxonomy, or something from the Linked Data cloud) -- call this your ontology.
- Use GATE Teamware to mark up a gold standard example set of annotations of the corpus (1.) relative to the ontology (2.).
- Use GATE Developer to build a semantic annotation pipeline to do the annotation job automatically and measure performance against the gold standard.
- Take the pipeline from 4. and apply it to your text pile using GATE Cloud (or embed it in your own systems using GATE Embedded).
- Use GATE Mimir to store the annotations relative to the ontology in a multiparadigm index server. (For techies: this sits in the backroom as a RESTful web service.)
- Use Ontotext KIM to add semantic search, knowledge facet search, ontology browsing, entity popularity graphing, time series graphing, annotation structure search and (last but not least) boolean full text search. (More techy stuff: mash up these types of search with your existing UIs.)
Hey presto, you have state-of-the-art information management applying your ontology to your corpus (and a sustainable process)... But your users don't care. They're just happy because now they can find stuff.