Developing Language Processing
Components with GATE
Version 5 (a User Guide)
For GATE version 5.1-beta1
(built October 30, 2009)
Hamish Cunningham
Diana Maynard
Kalina Bontcheva
Valentin Tablan
Marin Dimitrov
Mike Dowman
Niraj Aswani
Ian Roberts
Yaoyong Li
Adam Funk
Genevieve Gorrell
Johann Petrak
Horacio Saggion
Danica Damljanovic
Angus Roberts
ⒸThe University of Sheffield 2001-2009
http://gate.ac.uk/
Work on GATE has been partly supported by EPSRC grants GR/K25267 (Large-Scale Information Extraction), GR/M31699 (GATE 2), RA007940 (EMILLE), GR/N15764/01 (AKT) and GR/R85150/01 (MIAKT), AHRB grant APN16396 (ETCSL/GATE), Matrixware, the Information Retrieval Facility and several EU-funded projects (SEKT, TAO, NeOn, MediaCampaign, MUSING, KnowledgeWeb, PrestoSpace, h-TechSight, enIRaF).
I GATE Basics
1 Introduction
1.1 How to Use this Text
1.2 Context
1.3 Overview
1.4 Some Evaluations
1.5 Changes in this Version
1.6 Further Reading
2 Installing and Running GATE
2.1 Downloading GATE
2.2 Installing and Running GATE
2.3 Using System Properties with GATE
2.4 Configuring GATE
2.5 Building GATE
2.6 Troubleshooting
3 Using GATE Developer
3.1 The GATE Developer Main Window
3.2 Loading and Viewing Documents
3.3 Creating and Viewing Corpora
3.4 Working with Annotations
3.5 Using CREOLE Plugins
3.6 Loading and Using Processing Resources
3.7 Creating and Running an Application
3.8 Saving Applications and Language Resources
3.9 Keyboard Shortcuts
3.10 Miscellaneous
4 CREOLE: the GATE Component Model
4.1 The Web and CREOLE
4.2 The GATE Framework
4.3 The Lifecycle of a CREOLE Resource
4.4 Processing Resources and Applications
4.5 Language Resources and Datastores
4.6 Built-in CREOLE Resources
4.7 CREOLE Resource Configuration
5 Language Resources: Corpora, Documents and Annotations
5.1 Features: Simple Attribute/Value Data
5.2 Corpora: Sets of Documents plus Features
5.3 Documents: Content plus Annotations plus Features
5.4 Annotations: Directed Acyclic Graphs
5.5 Document Formats
5.6 XML Input/Output
6 ANNIE: a Nearly-New Information Extraction System
6.1 Document Reset
6.2 Tokeniser
6.3 Gazetteer
6.4 Sentence Splitter
6.5 RegEx Sentence Splitter
6.6 Part of Speech Tagger
6.7 Semantic Tagger
6.8 Orthographic Coreference (OrthoMatcher)
6.9 Pronominal Coreference
6.10 A Walk-Through Example
II GATE for Advanced Users
7 GATE Embedded
7.1 Quick Start with GATE Embedded
7.2 Resource Management in GATE Embedded
7.3 Using CREOLE Plugins
7.4 Language Resources
7.5 Processing Resources
7.6 Controllers
7.7 Persistent Applications
7.8 Ontologies
7.9 Creating a New Annotation Schema
7.10 Creating a New CREOLE Resource
7.11 Adding Support for a New Document Format
7.12 Using GATE Embedded in a Multithreaded Environment
7.13 Using GATE Embedded within a Spring Application
7.14 Using GATE Embedded within a Tomcat Web Application
7.15 Groovy Scripting for GATE
7.16 Saving Config Data to gate.xml
7.17 Annotation merging through the API
8 JAPE: Regular Expressions over Annotations
8.1 The Left-Hand Side
8.2 LHS Operators in Detail
8.3 The Right-Hand Side
8.4 Use of Priority
8.5 Using Phases Sequentially
8.6 Using Java Code on the RHS
8.7 Optimising for Speed
8.8 Ontology Aware Grammar Transduction
8.9 Serializing JAPE Transducer
8.10 The JAPE Debugger
8.11 Notes for Montreal Transducer Users
9 ANNIC: ANNotations-In-Context
9.1 Instantiating SSD
9.2 Search GUI
9.3 Using SSD from GATE Embedded
10 Performance Evaluation of Language Analysers
10.1 Metrics for Evaluation in Information Extraction
10.2 The Annotation Diff Tool
10.3 Corpus Quality Assurance
10.4 Corpus Benchmark Tool
10.5 A Plugin Computing Inter-Annotator Agreement (IAA)
10.6 A Plugin Computing the BDM Scores for an Ontology
11 Profiling Processing Resources
11.1 Overview
11.2 Graphical User Interface
11.3 Command Line Interface
11.4 Application Programming Interface
12 Developing GATE
12.1 Reporting Bugs and Requesting Features
12.2 Contributing Patches
12.3 Creating New Plugins
12.4 Updating this User Guide
III CREOLE Plugins
13 Gazetteers
13.1 Introduction to Gazetteers
13.2 Gazetteer Visual Resource - GAZE
13.3 OntoGazetteer
13.4 Gaze Ontology Gazetteer Editor
13.5 Hash Gazetteer
13.6 Flexible Gazetteer
13.7 Gazetteer List Collector
13.8 OntoRoot Gazetteer
13.9 Large KB Gazetteer
14 Working with Ontologies
14.1 Data Model for Ontologies
14.2 Ontology Event Model
14.3 The Ontology Plugin: Current Implementation
14.4 The Ontology_OWLIM2 plugin: backwards-compatible implementation
14.5 GATE Ontology Editor
14.6 Ontology Annotation Tool
14.7 Using the ontology API
14.8 Using the ontology API (old version)
14.9 Ontology-Aware JAPE Transducer
14.10 Annotating Text with Ontological Information
14.11 Populating Ontologies
14.12 Ontology API and Implementation Changes
15 Machine Learning
15.1 ML Generalities
15.2 Batch Learning PR
15.3 Machine Learning PR
16 Tools for Alignment Tasks
16.1 Introduction
16.2 The Tools
17 Parsers and Taggers
17.1 Verb Group Chunker
17.2 Noun Phrase Chunker
17.3 Tree Tagger
17.4 TaggerFramework
17.5 Chemistry Tagger
17.6 ABNER
17.7 Stemmer
17.8 GATE Morphological Analyzer
17.9 MiniPar Parser
17.10 RASP Parser
17.11 SUPPLE Parser
17.12 Stanford Parser
18 Combining GATE and UIMA
18.1 Embedding a UIMA AE in GATE
18.2 Embedding a GATE CorpusController in UIMA
19 More (CREOLE) Plugins
19.1 Language Plugins
19.2 Flexible Exporter
19.3 Annotation Set Transfer
19.4 Information Retrieval in GATE
19.5 Websphinx Web Crawler
19.6 Google Plugin
19.7 Yahoo Plugin
19.8 WordNet in GATE
19.9 Kea - Automatic Keyphrase Detection
19.10 Ontotext JapeC Compiler
19.11 Annotation Merging Plugin
19.12 Chinese Word Segmentation
19.13 Copying Annotations between Documents
19.14 OpenCalais Plugin
19.15 LingPipe Plugin
19.16 OpenNLP Plugin
19.17 Inter Annotator Agreement
19.18 Balanced Distance Metric Computation
19.19 Schema Annotation Editor
Appendices
Appendices
A Change Log
A.1 Version 5.1 beta 1 (Autumn 2009)
A.2 July 2009 (FIG’09 Summer School)
A.3 Version 5.0 (May 2009)
A.4 Version 4.0 (July 2007)
A.5 Version 3.1 (April 2006)
A.6 January 2005
A.7 December 2004
A.8 September 2004
A.9 Version 3 Beta 1 (August 2004)
A.10 July 2004
A.11 June 2004
A.12 April 2004
A.13 March 2004
A.14 Version 2.2 – August 2003
A.15 Version 2.1 – February 2003
A.16 June 2002
B Version 5.1 Plugins Name Map
C Design Notes
C.1 Patterns
C.2 Exception Handling
D JAPE: Implementation
D.1 Formal Description of the JAPE Grammar
D.2 Relation to CPSL
D.3 Initialisation of a JAPE Grammar
D.4 Execution of JAPE Grammars
D.5 Using a Different Java Compiler
E Ant Tasks for GATE
E.1 Declaring the Tasks
E.2 The packagegapp task - bundling an application with its dependencies
E.3 The expandcreoles Task - Merging Annotation-Driven Config into creole.xml
F Named-Entity State Machine Patterns
F.1 Main.jape
F.2 first.jape
F.3 firstname.jape
F.4 name.jape
F.5 name_post.jape
F.6 date_pre.jape
F.7 date.jape
F.8 reldate.jape
F.9 number.jape
F.10 address.jape
F.11 url.jape
F.12 identifier.jape
F.13 jobtitle.jape
F.14 final.jape
F.15 unknown.jape
F.16 name_context.jape
F.17 org_context.jape
F.18 loc_context.jape
F.19 clean.jape
G Part-of-Speech Tags used in the Hepple Tagger
References
References
Colophon