Log in Help
Print
Home 〉 overview.html
 

GATE: a full-lifecycle open source solution for text processing

Contents

(Impatient? See the 2-minute guide.)

1. Introduction

GATE is over 15 years old and is in active use for all types of computational task involving human language. GATE excels at text analysis of all shapes and sizes. From large corporations to small startups, from €multi-million research consortia to undergraduate projects, our user community is the largest and most diverse of any system of this type, and is spread across all but one of the continents1.

GATE is open source free software; users can obtain free support from the user and developer community via GATE.ac.uk or on a commercial basis from our industrial partners. We are the biggest open source language processing project with a development team more than double the size of the largest comparable projects (many of which are integrated with GATE2). More than €5 million has been invested in GATE development3; our objective is to make sure that this continues to be money well spent for all GATE's users.

This note summarises the GATE software and process and gives examples of some of their uses. We believe that GATE is the leading system of its type, but as scientists we have to advise you not to take our word for it; that's why we've measured our software in many of the competitive evaluations over the last decade-and-a-half (MUC, TREC, ACE, DUC, ...). We invite you to give it a try, to get involved with the GATE community, and to contribute to human language science, engineering and development.

2. The GATE Family

GATE has grown over the years to include a desktop client for developers, a workflow-based web application, a Java library, an architecture and a process. GATE is:

We also develop:

For more information see the family pages.

One of our original motivations was to remove the necessity for solving common engineering problems before doing useful research, or re-engineering before deploying research results into applications. Core functions of GATE take care of the lion's share of the engineering:

On top of the core functions GATE includes components for diverse language processing tasks, e.g. parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. GATE Developer and Embedded are supplied with an Information Extraction system (ANNIE) which has been adapted and evaluated very widely (numerous industrial systems, research systems evaluated in MUC, TREC, ACE, DUC, Pascal, NTCIR, etc.). ANNIE is often used to create RDF or OWL (metadata) for unstructured content (semantic annotation).

GATE version 1 was written in the mid-1990s; at the turn of the new millennium we completely rewrote the system in Java; version 5 was released in June 2009.

2.1. Component model

One of the reasons GATE has lasted well and been successful is that the entire core is broken down into reusable chunks (using the original Java component model). Some of the APIs available in Embedded are summarised here:

3. First Cousins - the Ontotext Family

Complementing GATE's development and collaborative distributed annotation tools, KIM provides a straightforward deployment option (front-end, back-end).

Many systems developed with GATE are embedded into existing applications of one sort or another; the Ontotext family provide a good alternative to this approach, and GATE-based annotation with a KIM/Mímir index and search engine represents a robust and mature solution for text analysis for enterprise search and similar.

4. Where next?

Hungry for more? A summary of the main sources of documentation and where to get help:

Good luck!

Footnotes

  1. Rumours that we're planning to send several of the development team to Antarctica on one-way tickets are, of course, false, libellous and wishful thinking.
  2. Our philosophy is reuse not reinvention, so we integrate and interoperate with other systems, e.g.: LingPipe, OpenNLP, UIMA, and many more specific tools.
  3. This is the figure for direct Sheffield-based investment only and therefore an underestimate.
  4. GATE Developer and GATE Embedded are bundled, and in older distributions were referred to just as "GATE".