Log in Help
Print
Homereleasesgate-6.0-build3764-ALLdoctao 〉 split.html
 


Developing Language Processing
Components with GATE
Version 6 (a User Guide)
  For GATE version 6.0
  (built November 11, 2010)

  Hamish Cunningham
  Diana Maynard
  Kalina Bontcheva
  Valentin Tablan
  Niraj Aswani
  Ian Roberts
  Genevieve Gorrell
  Adam Funk
  Angus Roberts
  Danica Damljanovic
  Thomas Heitz
  Mark Greenwood
  Horacio Saggion
  Johann Petrak
  Yaoyong Li
  Wim Peters
  et al
  The University of Sheffield 2001-2010

  http://gate.ac.uk/

PDF version

Single HTML page

Multiple HTML pages


Work on GATE has been partly supported by EPSRC grants GR/K25267 (Large-Scale Information Extraction), GR/M31699 (GATE 2), RA007940 (EMILLE), GR/N15764/01 (AKT) and GR/R85150/01 (MIAKT), AHRB grant APN16396 (ETCSL/GATE), Matrixware, the Information Retrieval Facility and several EU-funded projects (SEKT, TAO, NeOn, MediaCampaign, MUSING, KnowledgeWeb, PrestoSpace, h-TechSight, enIRaF).

Developing Language Processing Components with GATE Version 6.0

2010 The University of Sheffield

Department of Computer Science
Regent Court
211 Portobello
Sheffield
S1 4DP

http://gate.ac.uk

This work is licenced under the Creative Commons Attribution-No Derivative Licence. You are free to copy, distribute, display, and perform the work under the following conditions:

With the understanding that:

For more information about the Creative Commons Attribution-No Derivative License, please visit this web address: http://creativecommons.org/licenses/by-nd/2.0/uk/

ISBN: TBA

Contents
I  GATE Basics
1 Introduction
 1.1 How to Use this Text
 1.2 Context
 1.3 Overview
 1.4 Some Evaluations
 1.5 Changes in this Version
 1.6 Further Reading
2 Installing and Running GATE
 2.1 Downloading GATE
 2.2 Installing and Running GATE
 2.3 Using System Properties with GATE
 2.4 Configuring GATE
 2.5 Building GATE
 2.6 Uninstalling GATE
 2.7 Troubleshooting
3 Using GATE Developer
 3.1 The GATE Developer Main Window
 3.2 Loading and Viewing Documents
 3.3 Creating and Viewing Corpora
 3.4 Working with Annotations
 3.5 Using CREOLE Plugins
 3.6 Loading and Using Processing Resources
 3.7 Creating and Running an Application
 3.8 Saving Applications and Language Resources
 3.9 Keyboard Shortcuts
 3.10 Miscellaneous
4 CREOLE: the GATE Component Model
 4.1 The Web and CREOLE
 4.2 The GATE Framework
 4.3 The Lifecycle of a CREOLE Resource
 4.4 Processing Resources and Applications
 4.5 Language Resources and Datastores
 4.6 Built-in CREOLE Resources
 4.7 CREOLE Resource Configuration
 4.8 Tools: How to Add Utilities to GATE Developer
5 Language Resources: Corpora, Documents and Annotations
 5.1 Features: Simple Attribute/Value Data
 5.2 Corpora: Sets of Documents plus Features
 5.3 Documents: Content plus Annotations plus Features
 5.4 Annotations: Directed Acyclic Graphs
 5.5 Document Formats
 5.6 XML Input/Output
6 ANNIE: a Nearly-New Information Extraction System
 6.1 Document Reset
 6.2 Tokeniser
 6.3 Gazetteer
 6.4 Sentence Splitter
 6.5 RegEx Sentence Splitter
 6.6 Part of Speech Tagger
 6.7 Semantic Tagger
 6.8 Orthographic Coreference (OrthoMatcher)
 6.9 Pronominal Coreference
 6.10 A Walk-Through Example
II  GATE for Advanced Users
7 GATE Embedded
 7.1 Quick Start with GATE Embedded
 7.2 Resource Management in GATE Embedded
 7.3 Using CREOLE Plugins
 7.4 Language Resources
 7.5 Processing Resources
 7.6 Controllers
 7.7 Duplicating a Resource
 7.8 Persistent Applications
 7.9 Ontologies
 7.10 Creating a New Annotation Schema
 7.11 Creating a New CREOLE Resource
 7.12 Adding Support for a New Document Format
 7.13 Using GATE Embedded in a Multithreaded Environment
 7.14 Using GATE Embedded within a Spring Application
 7.15 Using GATE Embedded within a Tomcat Web Application
 7.16 Groovy for GATE
 7.17 Saving Config Data to gate.xml
 7.18 Annotation merging through the API
8 JAPE: Regular Expressions over Annotations
 8.1 The Left-Hand Side
 8.2 LHS Operators in Detail
 8.3 The Right-Hand Side
 8.4 Use of Priority
 8.5 Using Phases Sequentially
 8.6 Using Java Code on the RHS
 8.7 Optimising for Speed
 8.8 Ontology Aware Grammar Transduction
 8.9 Serializing JAPE Transducer
 8.10 The JAPE Debugger
 8.11 Notes for Montreal Transducer Users
9 ANNIC: ANNotations-In-Context
 9.1 Instantiating SSD
 9.2 Search GUI
 9.3 Using SSD from GATE Embedded
10 Performance Evaluation of Language Analysers
 10.1 Metrics for Evaluation in Information Extraction
 10.2 The Annotation Diff Tool
 10.3 Corpus Quality Assurance
 10.4 Corpus Benchmark Tool
 10.5 A Plugin Computing Inter-Annotator Agreement (IAA)
 10.6 A Plugin Computing the BDM Scores for an Ontology
11 Profiling Processing Resources
 11.1 Overview
 11.2 Graphical User Interface
 11.3 Command Line Interface
 11.4 Application Programming Interface
12 Developing GATE
 12.1 Reporting Bugs and Requesting Features
 12.2 Contributing Patches
 12.3 Creating New Plugins
 12.4 Updating this User Guide
III  CREOLE Plugins
13 Gazetteers
 13.1 Introduction to Gazetteers
 13.2 ANNIE Gazetteer
 13.3 Gazetteer Visual Resource - GAZE
 13.4 OntoGazetteer
 13.5 Gaze Ontology Gazetteer Editor
 13.6 Hash Gazetteer
 13.7 Flexible Gazetteer
 13.8 Gazetteer List Collector
 13.9 OntoRoot Gazetteer
 13.10 Large KB Gazetteer
 13.11 The Shared Gazetteer for multithreaded processing
14 Working with Ontologies
 14.1 Data Model for Ontologies
 14.2 Ontology Event Model
 14.3 The Ontology Plugin: Current Implementation
 14.4 The Ontology_OWLIM2 plugin: backwards-compatible implementation
 14.5 GATE Ontology Editor
 14.6 Ontology Annotation Tool
 14.7 Relation Annotation Tool
 14.8 Using the ontology API
 14.9 Using the ontology API (old version)
 14.10 Ontology-Aware JAPE Transducer
 14.11 Annotating Text with Ontological Information
 14.12 Populating Ontologies
 14.13 Ontology API and Implementation Changes
15 Machine Learning
 15.1 ML Generalities
 15.2 Batch Learning PR
 15.3 Machine Learning PR
16 Tools for Alignment Tasks
 16.1 Introduction
 16.2 The Tools
17 Parsers and Taggers
 17.1 Verb Group Chunker
 17.2 Noun Phrase Chunker
 17.3 TaggerFramework
 17.4 Chemistry Tagger
 17.5 ABNER
 17.6 Stemmer
 17.7 GATE Morphological Analyzer
 17.8 MiniPar Parser
 17.9 RASP Parser
 17.10 SUPPLE Parser
 17.11 Stanford Parser
 17.12 OpenCalais, LingPipe and OpenNLP
18 Combining GATE and UIMA
 18.1 Embedding a UIMA AE in GATE
 18.2 Embedding a GATE CorpusController in UIMA
19 More (CREOLE) Plugins
 19.1 Language Plugins
 19.2 Flexible Exporter
 19.3 Annotation Set Transfer
 19.4 Information Retrieval in GATE
 19.5 Websphinx Web Crawler
 19.6 Google Plugin
 19.7 Yahoo Plugin
 19.8 Google Translator PR
 19.9 WordNet in GATE
 19.10 Kea - Automatic Keyphrase Detection
 19.11 Ontotext JapeC Compiler
 19.12 Annotation Merging Plugin
 19.13 Chinese Word Segmentation
 19.14 Copying Annotations between Documents
 19.15 OpenCalais Plugin
 19.16 LingPipe Plugin
 19.17 OpenNLP Plugin
 19.18 Tagger_MetaMap Plugin
 19.19 Inter Annotator Agreement
 19.20 Balanced Distance Metric Computation
 19.21 Schema Annotation Editor
Appendices
A Change Log
 A.1 Version 6.0 (November 2010)
 A.2 Version 5.2.1 (May 2010)
 A.3 Version 5.2 (April 2010)
 A.4 Version 5.1 (December 2009)
 A.5 Version 5.0 (May 2009)
 A.6 Version 4.0 (July 2007)
 A.7 Version 3.1 (April 2006)
 A.8 January 2005
 A.9 December 2004
 A.10 September 2004
 A.11 Version 3 Beta 1 (August 2004)
 A.12 July 2004
 A.13 June 2004
 A.14 April 2004
 A.15 March 2004
 A.16 Version 2.2 – August 2003
 A.17 Version 2.1 – February 2003
 A.18 June 2002
B Version 5.1 Plugins Name Map
C Design Notes
 C.1 Patterns
 C.2 Exception Handling
D JAPE: Implementation
 D.1 Formal Description of the JAPE Grammar
 D.2 Relation to CPSL
 D.3 Initialisation of a JAPE Grammar
 D.4 Execution of JAPE Grammars
 D.5 Using a Different Java Compiler
E Ant Tasks for GATE
 E.1 Declaring the Tasks
 E.2 The packagegapp task - bundling an application with its dependencies
 E.3 The expandcreoles Task - Merging Annotation-Driven Config into creole.xml
F Named-Entity State Machine Patterns
 F.1 Main.jape
 F.2 first.jape
 F.3 firstname.jape
 F.4 name.jape
 F.5 name_post.jape
 F.6 date_pre.jape
 F.7 date.jape
 F.8 reldate.jape
 F.9 number.jape
 F.10 address.jape
 F.11 url.jape
 F.12 identifier.jape
 F.13 jobtitle.jape
 F.14 final.jape
 F.15 unknown.jape
 F.16 name_context.jape
 F.17 org_context.jape
 F.18 loc_context.jape
 F.19 clean.jape
G Part-of-Speech Tags used in the Hepple Tagger
References
References
Colophon