Log in Help
Home 〉 ie

GATE Information Extraction

If information is power and riches, then it is not the amount that gives the value, but access at the right time and in the most suitable form.

Information Extraction (IE) systems analyse unrestricted text in order to extract information about pre-specified types of events, entities or relationships.

GATE has been used for many IE projects in many languages and problem domain, and has competed in the MUC and ACE evaluations. GATE has a built-in IE component set called ANNIE. Below is a short introduction to IE; for a longer introduction see this IE User Guide.

For more information about GATE and IE, contact the GATE team. See also the new edition of the Encyclopaedia of Language and Linguisics survey article on IE. Sheffield and others may be able to provide services to customise GATE to your needs. See also:

(Note: chunks of these pages are derived from a previous version written by Malcolm Crawford.)

Information Extraction is not Information Retrieval: Information Extraction differs from traditional techniques in that it does not recover from a collection a subset of documents which are hopefully relevant to a query, based on key-word searching (perhaps augmented by a thesaurus). Instead, the goal is to extract from the documents (which may be in a variety of languages) salient facts about prespecified types of events, entities or relationships. These facts are then usually entered automatically into a database, which may then be used to analyse the data for trends, to give a natural language summary, or simply to serve for on-line access.

you analyse the documents

Information Extraction gets facts out of documents --

you analyse the facts

Here are some example applications of IE.

Why is Information Extraction difficult?

There are many ways of expressing the same fact:

Information may need to be combined across several sentences:

You might want to try an Information Extraction task yourself.