Log in Help
Print
Homereleasesgate-5.0-beta1-build3048-ALLdoctao 〉 split.html
 

NLP GATE



Developing Language Processing Components with GATE
Version 5 (a User Guide)
  For GATE version 5.0-beta1
  (built October 31, 2008)


  Hamish Cunningham
  Diana Maynard
  Kalina Bontcheva
  Valentin Tablan
  Cristian Ursu
  Marin Dimitrov
  Mike Dowman
  Niraj Aswani
  Ian Roberts
  Yaoyong Li
  Andrey Shafirin
  Adam Funk

  ©The University of Sheffield 2001-2008

  http://gate.ac.uk/

PDF version

Single HTML page

Multiple HTML pages


Work on GATE has been partly supported by EPSRC grants GR/K25267 (Large-Scale Information Extraction), GR/M31699 (GATE 2), RA007940 (EMILLE), GR/N15764/01 (AKT) and GR/R85150/01 (MIAKT), AHRB grant APN16396 (ETCSL/GATE), and several EU-funded projects (SEKT, TAO, NeOn, MediaCampaign, MUSING, KnowledgeWeb, PrestoSpace, h-TechSight, enIRaF).

Contents
1 Introduction
 1.1 How to Use This Text
 1.2 Context
 1.3 Overview
 1.4 Structure of the Book
 1.5 Further Reading
2 Change Log
 2.1 Version 5.0-beta1 (October 2008)
 2.2 Version 4.0 (July 2007)
 2.3 Version 3.1 (April 2006)
 2.4 January 2005
 2.5 December 2004
 2.6 September 2004
 2.7 Version 3 Beta 1 (August 2004)
 2.8 July 2004
 2.9 June 2004
 2.10 April 2004
 2.11 March 2004
 2.12 Version 2.2 – August 2003
 2.13 Version 2.1 – February 2003
 2.14 June 2002
3 How To…
 3.1 Download GATE
 3.2 Install and Run GATE
 3.3 [D,F] Use System Properties with GATE
 3.4 [D,F] Use (CREOLE) Plug-ins
 3.5 Troubleshooting
 3.6 [D] Get Started with the GUI
 3.7 [D,F] Configure GATE
 3.8 Build GATE
 3.9 [D] Use GATE with Maven or JPF
 3.10 [D,F] Create a New CREOLE Resource
 3.11 [F] Instantiate CREOLE Resources
 3.12 [D] Load CREOLE Resources
 3.13 [D,F] Configure CREOLE Resources
 3.14 [D] Create and Run an Application
 3.15 [D] Run PRs Conditionally on Document Features
 3.16 [D] View Annotations
 3.17 [D] Do Information Extraction with ANNIE
 3.18 [D] Modify ANNIE
 3.19 [D] Create and Edit Test Data
 3.20 [D,F] Create a New Annotation Schema
 3.21 [D] Save and Restore LRs in Data Stores
 3.22 [D] Save Resource Parameter State to File
 3.23 [D,F] Perform Evaluation with the AnnotationDiff tool
 3.24 [D] Use the Corpus Benchmark Evaluation tool
 3.25 [D] Write JAPE Grammars
 3.26 [F] Embed NLE in other Applications
 3.27 [F] Use GATE within a Spring application
 3.28 [F] Use GATE within a Tomcat Web Application
 3.29 [F] Use GATE in a Multithreaded Environment
 3.30 [D,F] Add support for a new document format
 3.31 [D] Dump Results to File
 3.32 [D] Stop GUI ‘Freezing’ on Linux
 3.33 [D] Stop GUI Crashing on Linux
 3.34 [D] Stop GATE Restoring GUI Sessions/Options
 3.35 Work with Unicode
 3.36 Work with Oracle and PostgreSQL
4 CREOLE: the GATE Component Model
 4.1 The Web and CREOLE
 4.2 Java Beans: a Simple Component Architecture
 4.3 The GATE Framework
 4.4 Language Resources and Processing Resources
 4.5 The Lifecycle of a CREOLE Resource
 4.6 Processing Resources and Applications
 4.7 Language Resources and Datastores
 4.8 Built-in CREOLE Resources
 4.9 CREOLE Resource Configuration
5 Visual CREOLE
 5.1 Gazetteer Visual Resource - GAZE
 5.2 Ontogazetteer
 5.3 The Co-reference Editor
6 Language Resources: Corpora, Documents and Annotations
 6.1 Features: Simple Attribute/Value Data
 6.2 Corpora: Sets of Documents plus Features
 6.3 Documents: Content plus Annotations plus Features
 6.4 Annotations: Directed Acyclic Graphs
 6.5 Document Formats
 6.6 XML Input/Output
7 JAPE: Regular Expressions Over Annotations
 7.1 Matching operators in detail
 7.2 Use of Context
 7.3 Use of Priority
 7.4 Use of negation
 7.5 Useful tricks
 7.6 Ontology aware grammar transduction
 7.7 Using Java code in JAPE rules
 7.8 Optimising for speed
 7.9 Serializing JAPE Transducer
 7.10 The JAPE Debugger
 7.11 Notes for Montreal Transducer users
8 ANNIE: a Nearly-New Information Extraction System
 8.1 Tokeniser
 8.2 Gazetteer
 8.3 Sentence Splitter
 8.4 RegEx Sentence Splitter
 8.5 Part of Speech Tagger
 8.6 Semantic Tagger
 8.7 Orthographic Coreference (OrthoMatcher)
 8.8 Pronominal Coreference
 8.9 A Walk-Through Example
9 (More CREOLE) Plugins
 9.1 Document Reset
 9.2 Verb Group Chunker
 9.3 Noun Phrase Chunker
 9.4 OntoText Gazetteer
 9.5 Flexible Gazetteer
 9.6 Gazetteer List Collector
 9.7 Tree Tagger
 9.8 Stemmer
 9.9 GATE Morphological Analyzer
 9.10 MiniPar Parser
 9.11 RASP Parser
 9.12 SUPPLE Parser (formerly BuChart)
 9.13 Stanford Parser
 9.14 Montreal Transducer
 9.15 Language Plugins
 9.16 Chemistry Tagger
 9.17 Flexible Exporter
 9.18 Annotation Set Transfer
 9.19 Information Retrieval in GATE
 9.20 Crawler
 9.21 Google Plugin
 9.22 Yahoo Plugin
 9.23 WordNet in GATE
 9.24 Machine Learning in GATE
 9.25 MinorThird
 9.26 MIAKT NLG Lexicon
 9.27 Kea - Automatic Keyphrase Detection
 9.28 Ontotext JapeC Compiler
 9.29 ANNIC
 9.30 Annotation Merging
 9.31 OntoRoot Gazetteer
10 Working with Ontologies
 10.1 Data Model for Ontologies
 10.2 Ontology Event Model (new in Gate 4)
 10.3 OWLIM Ontology LR
 10.4 GATE’s Ontology Editor
 10.5 Instantiating OWLIM Ontology using GATE API
 10.6 Ontology-Aware JAPE Transducer
 10.7 Annotating text with Ontological Information
 10.8 Populating Ontologies
 10.9 Ontology Annotation Tool
11 Machine Learning API
 .1ML Generalities
 .2The Batch Learning PR in GATE
 .3Examples of configuration file for the three learning types
 .4How to use the ML API
 .5The outputs of the ML API
12 Tools for Alignment Tasks
 12.1 Introduction
 12.2 Tools for Alignment Tasks
13 Performance Evaluation of Language Analysers
 13.1 The AnnotationDiff Tool
 13.2 The six annotation relations explained
 13.3 Benchmarking tool
 13.4 Metrics for Evaluation in Information Extraction
 13.5 Metrics for Evaluation of Inter-Annotator Agreement
 13.6 A Plugin for Computing Inter-Annotator Agreement
14 Users, Groups, and LR Access Rights
 14.1 Java serialisation and LR access rights
 14.2 Oracle Datastore and LR access rights
15 Developing GATE
 15.1 Creating new plugins
 15.2 Updating this User Guide
16 Combining GATE and UIMA
 16.1 Embedding a UIMA TAE in GATE
 16.2 Embedding a GATE CorpusController in UIMA
Appendices
Appendices
A Design Notes
 A.1 Patterns
 A.2 Exception Handling
B JAPE: Implementation
 B.1 Formal Description of the JAPE Grammar
 B.2 Relation to CPSL
 B.3 Algorithms for JAPE Rule Application
 B.4 Label Binding Scheme
 B.5 Classes
 B.6 Implementation
 B.7 Compilation
 B.8 Using a Different Java Compiler
C Named-Entity State Machine Patterns
 C.1 Main.jape
 C.2 first.jape
 C.3 firstname.jape
 C.4 name.jape
 C.5 name_post.jape
 C.6 date_pre.jape
 C.7 date.jape
 C.8 reldate.jape
 C.9 number.jape
 C.10 address.jape
 C.11 url.jape
 C.12 identifier.jape
 C.13 jobtitle.jape
 C.14 final.jape
 C.15 unknown.jape
 C.16 name_context.jape
 C.17 org_context.jape
 C.18 loc_context.jape
 C.19 clean.jape
D Part-of-Speech Tags used in the Hepple Tagger
E Sample ML Configuration File
F IAA Measures for Classification Tasks
References
References
Colophon