Log in Help
Home 〉 releases

GATE: releasing the missing links

Friday May 6th 2011


1. Introduction: Open Source Enterprise Search

The GATE team (gate.ac.uk) today release three new products:

All three are released under the open source AGPL licence (commercial options also available).

With this release GATE becomes the only open source solution that covers the entire text analysis and search lifecycle.

Now you can do enterprise search, business intelligence, voice of the customer, web mining, unstructured data management, scientific literature analysis etc. etc. etc. with a mature, 100% open source solution.

Also in this release:

GATE Developer and GATE Embedded are available for download here; other members of the family are available on http://gatecloud.net/ and from our Sourceforge pages. More details: the GATE family.

2. GATECloud.net

On GATECloud.net you can use GATE on the Amazon Elastic Compute Cloud via a simple point-and-click web tool to:

The benefits of a cloud solution include:

There are several other cloud-based systems out there that do some of what we do. Here are some differences:

3. GATE Mímir

GATE Mímir multiparadigm indexing: concept search, full-text search and annotation structure search in one scaleable index.

Mímir is a multi-paradigm information management index and repository which can be used to index and search over text, annotations, semantic schemas (ontologies), and semantic meta-data (instance data). It allows queries that arbitrarily mix full-text, structural, linguistic and semantic queries and that can scale to hundreds of gigabytes of text. A typical rich search or semantic annotation project deals with large quantities of data of different kinds. Mímir provides a framework for implementing indexing and search functionality across all these data types.

4. GATE Teamware

GATE Teamware is a workflow-based collaboration suite for manual and semi-automatic annotation and curation projects with distributed teams, QA, and process reporting.

It is a cost-effective environment for specifying and creating test and training data, enabling you to harness a widely distributed workforce and monitor progress & results remotely in real time. It’s also very easy to use: a new project can be up and running in less than five minutes.

5. The 2 Minute Guide to Helping People Find Stuff with GATE

  1. Take one large pile of text (documents, emails, tweets, patents, papers, transcripts, blogs, comments, acts of parliament, and so on and so forth) -- call this your corpus.
  2. Pick a structured description of interesting things in the text (a telephone directory, or chemical taxonomy, or something from the Linked Data cloud) -- call this your ontology.
  3. Use GATE Teamware to mark up a gold standard example set of annotations of the corpus (1.) relative to the ontology (2.).
  4. Use GATE Developer to build a semantic annotation pipeline to do the annotation job automatically and measure performance against the gold standard.
  5. Take the pipeline from 4. and apply it to your text pile using GATE Cloud (or embed it in your own systems using GATE Embedded).
  6. Use GATE Mímir to store the annotations relative to the ontology in a multiparadigm index server. (For techies: this sits in the backroom as a RESTful web service.)
  7. Use Ontotext KIM to add semantic search, knowledge facet search, ontology browsing, entity popularity graphing, time series graphing, annotation structure search and (last but not least) boolean full text search. (More techy stuff: mash up these types of search with your existing UIs.)

Hey presto, you have state-of-the-art information management applying your ontology to your corpus (and a sustainable process)... But your users don't care. They're just happy because now they can find stuff.