Contents
I GATE Basics
1 Introduction
1.1 How to Use this Text
1.2 Context
1.3 Overview
1.3.1 Developing and Deploying Language Processing Facilities
1.3.2 Built-In Components
1.3.3 Additional Facilities
1.3.4 An Example
1.4 Some Evaluations
1.5 Changes in this Version
1.5.1 Version 6.0-beta1 (August 2010)
1.6 Further Reading
2 Installing and Running GATE
2.1 Downloading GATE
2.2 Installing and Running GATE
2.2.1 The Easy Way
2.2.2 The Hard Way (1)
2.2.3 The Hard Way (2): Subversion
2.3 Using System Properties with GATE
2.4 Configuring GATE
2.5 Building GATE
2.5.1 Using GATE with Maven
2.6 Uninstalling GATE
2.7 Troubleshooting
2.7.1 I don’t see the Java console messages under Windows
2.7.2 When I execute GATE, nothing happens
2.7.3 On Ubuntu, GATE is very slow or doesn’t start
2.7.4 How to use GATE on a 64 bit system?
2.7.5 I got the error: Could not reserve enough space for object heap
2.7.6 From Eclipse, I got the error: java.lang.OutOfMemoryError: Java heap space
2.7.7 On MacOS, I got the error: java.lang.OutOfMemoryError: Java heap space
2.7.8 I got the error: log4j:WARN No appenders could be found for logger...
2.7.9 Text is incorrectly refreshed after scrolling and become unreadable
2.7.10 An error occurred when running the TreeTagger plugin
2.7.11 I got the error: HighlightData cannot be cast to ...HighlightInfo
3 Using GATE Developer
3.1 The GATE Developer Main Window
3.2 Loading and Viewing Documents
3.3 Creating and Viewing Corpora
3.4 Working with Annotations
3.4.1 The Annotation Sets View
3.4.2 The Annotations List View
3.4.3 The Annotations Stack View
3.4.4 The Co-reference Editor
3.4.5 Creating and Editing Annotations
3.4.6 Schema-Driven Editing
3.4.7 Printing Text with Annotations
3.5 Using CREOLE Plugins
3.6 Loading and Using Processing Resources
3.7 Creating and Running an Application
3.7.1 Running an Application on a Datastore
3.7.2 Running PRs Conditionally on Document Features
3.7.3 Doing Information Extraction with ANNIE
3.7.4 Modifying ANNIE
3.8 Saving Applications and Language Resources
3.8.1 Saving Documents to File
3.8.2 Saving and Restoring LRs in Datastores
3.8.3 Saving Application States to a File
3.8.4 Saving an Application with its Resources (e.g. GATE Teamware)
3.9 Keyboard Shortcuts
3.10 Miscellaneous
3.10.1 Stopping GATE from Restoring Developer Sessions/Options
3.10.2 Working with Unicode
4 CREOLE: the GATE Component Model
4.1 The Web and CREOLE
4.2 The GATE Framework
4.3 The Lifecycle of a CREOLE Resource
4.4 Processing Resources and Applications
4.5 Language Resources and Datastores
4.6 Built-in CREOLE Resources
4.7 CREOLE Resource Configuration
4.7.1 Configuration with XML
4.7.2 Configuring Resources using Annotations
4.7.3 Mixing the Configuration Styles
4.8 Tools: How to Add Utilities to GATE Developer
4.8.1 Putting your tools in a sub-menu
5 Language Resources: Corpora, Documents and Annotations
5.1 Features: Simple Attribute/Value Data
5.2 Corpora: Sets of Documents plus Features
5.3 Documents: Content plus Annotations plus Features
5.4 Annotations: Directed Acyclic Graphs
5.4.1 Annotation Schemas
5.4.2 Examples of Annotated Documents
5.4.3 Creating, Viewing and Editing Diverse Annotation Types
5.5 Document Formats
5.5.1 Detecting the Right Reader
5.5.2 XML
5.5.3 HTML
5.5.4 SGML
5.5.5 Plain text
5.5.6 RTF
5.5.7 Email
5.6 XML Input/Output
6 ANNIE: a Nearly-New Information Extraction System
6.1 Document Reset
6.2 Tokeniser
6.2.1 Tokeniser Rules
6.2.2 Token Types
6.2.3 English Tokeniser
6.3 Gazetteer
6.4 Sentence Splitter
6.5 RegEx Sentence Splitter
6.6 Part of Speech Tagger
6.7 Semantic Tagger
6.8 Orthographic Coreference (OrthoMatcher)
6.8.1 GATE Interface
6.8.2 Resources
6.8.3 Processing
6.9 Pronominal Coreference
6.9.1 Quoted Speech Submodule
6.9.2 Pleonastic It Submodule
6.9.3 Pronominal Resolution Submodule
6.9.4 Detailed Description of the Algorithm
6.10 A Walk-Through Example
6.10.1 Step 1 - Tokenisation
6.10.2 Step 2 - List Lookup
6.10.3 Step 3 - Grammar Rules
II GATE for Advanced Users
7 GATE Embedded
7.1 Quick Start with GATE Embedded
7.2 Resource Management in GATE Embedded
7.3 Using CREOLE Plugins
7.4 Language Resources
7.4.1 GATE Documents
7.4.2 Feature Maps
7.4.3 Annotation Sets
7.4.4 Annotations
7.4.5 GATE Corpora
7.5 Processing Resources
7.6 Controllers
7.7 Duplicating a Resource
7.8 Persistent Applications
7.9 Ontologies
7.10 Creating a New Annotation Schema
7.11 Creating a New CREOLE Resource
7.12 Adding Support for a New Document Format
7.13 Using GATE Embedded in a Multithreaded Environment
7.14 Using GATE Embedded within a Spring Application
7.14.1 Duplication in Spring
7.14.2 Spring pooling
7.14.3 Further reading
7.15 Using GATE Embedded within a Tomcat Web Application
7.15.1 Recommended Directory Structure
7.15.2 Configuration Files
7.15.3 Initialization Code
7.16 Groovy for GATE
7.16.1 Groovy Scripting Console for GATE
7.16.2 Groovy scripting PR
7.16.3 The Scriptable Controller
7.16.4 Utility methods
7.17 Saving Config Data to gate.xml
7.18 Annotation merging through the API
8 JAPE: Regular Expressions over Annotations
8.1 The Left-Hand Side
8.1.1 Matching a Simple Text String
8.1.2 Matching Entire Annotation Types
8.1.3 Using Attributes and Values
8.1.4 Using Meta-Properties
8.1.5 Using Templates
8.1.6 Multiple Pattern/Action Pairs
8.1.7 LHS Macros
8.1.8 Using Context
8.1.9 Multi-Constraint Statements
8.1.10 Negation
8.1.11 Escaping Special Characters
8.2 LHS Operators in Detail
8.2.1 Compositional Operators
8.2.2 Matching Operators
8.3 The Right-Hand Side
8.3.1 A Simple Example
8.3.2 Copying Feature Values from the LHS to the RHS
8.3.3 RHS Macros
8.4 Use of Priority
8.5 Using Phases Sequentially
8.6 Using Java Code on the RHS
8.6.1 A More Complex Example
8.6.2 Adding a Feature to the Document
8.6.3 Finding the Tokens of a Matched Annotation
8.6.4 Using Named Blocks
8.6.5 Java RHS Overview
8.7 Optimising for Speed
8.8 Ontology Aware Grammar Transduction
8.9 Serializing JAPE Transducer
8.9.1 How to Serialize?
8.9.2 How to Use the Serialized Grammar File?
8.10 The JAPE Debugger
8.11 Notes for Montreal Transducer Users
9 ANNIC: ANNotations-In-Context
9.1 Instantiating SSD
9.2 Search GUI
9.2.1 Overview
9.2.2 Syntax of Queries
9.2.3 Top Section
9.2.4 Central Section
9.2.5 Bottom Section
9.3 Using SSD from GATE Embedded
9.3.1 How to instantiate a searchabledatastore
9.3.2 How to search in this datastore
10 Performance Evaluation of Language Analysers
10.1 Metrics for Evaluation in Information Extraction
10.1.1 Annotation Relations
10.1.2 Cohen’s Kappa
10.1.3 Precision, Recall, F-Measure
10.1.4 Macro and Micro Averaging
10.2 The Annotation Diff Tool
10.2.1 Performing Evaluation with the Annotation Diff Tool
10.3 Corpus Quality Assurance
10.3.1 Description of the interface
10.3.2 Step by step usage
10.3.3 Details of the Corpus statistics table
10.3.4 Details of the Document statistics table
10.3.5 GATE Embedded API for the measures
10.3.6 Quality Assurance PR
10.4 Corpus Benchmark Tool
10.4.1 Preparing the Corpora for Use
10.4.2 Defining Properties
10.4.3 Running the Tool
10.4.4 The Results
10.5 A Plugin Computing Inter-Annotator Agreement (IAA)
10.5.1 IAA for Classification
10.5.2 IAA For Named Entity Annotation
10.5.3 The BDM-Based IAA Scores
10.6 A Plugin Computing the BDM Scores for an Ontology
11 Profiling Processing Resources
11.1 Overview
11.1.1 Features
11.1.2 Limitations
11.2 Graphical User Interface
11.3 Command Line Interface
11.4 Application Programming Interface
11.4.1 Log4j.properties
11.4.2 Benchmark log format
11.4.3 Enabling profiling
11.4.4 Reporting tool
12 Developing GATE
12.1 Reporting Bugs and Requesting Features
12.2 Contributing Patches
12.3 Creating New Plugins
12.3.1 Where to Keep Plugins in the GATE Hierarchy
12.3.2 What to Call your Plugin
12.3.3 Writing a New PR
12.3.4 Writing a New VR
12.3.5 Adding Plugins to the Nightly Build
12.4 Updating this User Guide
12.4.1 Building the User Guide
12.4.2 Making Changes to the User Guide
III CREOLE Plugins
13 Gazetteers
13.1 Introduction to Gazetteers
13.2 ANNIE Gazetteer
13.2.1 Creating and Modifying Gazetteer Lists
13.2.2 ANNIE Gazetteer Editor
13.3 Gazetteer Visual Resource - GAZE
13.3.1 Display Modes
13.3.2 Linear Definition Pane
13.3.3 Linear Definition Toolbar
13.3.4 Operations on Linear Definition Nodes
13.3.5 Gazetteer List Pane
13.3.6 Mapping Definition Pane
13.4 OntoGazetteer
13.5 Gaze Ontology Gazetteer Editor
13.5.1 The Gaze Gazetteer List and Mapping Editor
13.5.2 The Gaze Ontology Editor
13.6 Hash Gazetteer
13.6.1 Prerequisites
13.6.2 Parameters
13.7 Flexible Gazetteer
13.8 Gazetteer List Collector
13.9 OntoRoot Gazetteer
13.9.1 How Does it Work?
13.9.2 Initialisation of OntoRoot Gazetteer
13.9.3 Simple steps to run OntoRoot Gazetteer
13.10 Large KB Gazetteer
13.10.1 Quick usage overview
13.10.2 Dictionary setup
13.10.3 Additional dictionary configuration
13.10.4 Processing Resource Configuration
13.10.5 Runtime configuration
13.10.6 Semantic Enrichment PR
13.11 The Shared Gazetteer for multithreaded processing
14 Working with Ontologies
14.1 Data Model for Ontologies
14.1.1 Hierarchies of Classes and Restrictions
14.1.2 Instances
14.1.3 Hierarchies of Properties
14.1.4 URIs
14.2 Ontology Event Model
14.2.1 What Happens when a Resource is Deleted?
14.3 The Ontology Plugin: Current Implementation
14.3.1 The OWLIMOntology Language Resource
14.3.2 The ConnectSesameOntology Language Resource
14.3.3 The CreateSesameOntology Language Resource
14.3.4 The OWLIM2 Backwards-Compatible Language Resource
14.3.5 Using Ontology Import Mappings
14.3.6 Using BigOWLIM
14.4 The Ontology_OWLIM2 plugin: backwards-compatible implementation
14.4.1 The OWLIMOntologyLR Language Resource
14.5 GATE Ontology Editor
14.6 Ontology Annotation Tool
14.6.1 Viewing Annotated Text
14.6.2 Editing Existing Annotations
14.6.3 Adding New Annotations
14.6.4 Options
14.7 Relation Annotation Tool
14.7.1 Description of the two views
14.7.2 New annotation and instance from text selection
14.7.3 New annotation and add label to existing instance from text selection
14.7.4 Create and set properties for annotation relation
14.7.5 Delete instance, label or property
14.7.6 Differences with OAT and Ontology Editor
14.8 Using the ontology API
14.9 Using the ontology API (old version)
14.10 Ontology-Aware JAPE Transducer
14.11 Annotating Text with Ontological Information
14.12 Populating Ontologies
14.13 Ontology API and Implementation Changes
14.13.1 Differences between the implementation plugins
14.13.2 Changes in the Ontology API
15 Machine Learning
15.1 ML Generalities
15.1.1 Some Definitions
15.1.2 GATE-Specific Interpretation of the Above Definitions
15.2 Batch Learning PR
15.2.1 Batch Learning PR Configuration File Settings
15.2.2 Case Studies for the Three Learning Types
15.2.3 How to Use the Batch Learning PR in GATE Developer
15.2.4 Output of the Batch Learning PR
15.2.5 Using the Batch Learning PR from the API
15.3 Machine Learning PR
15.3.1 The DATASET Element
15.3.2 The ENGINE Element
15.3.3 The WEKA Wrapper
15.3.4 The MAXENT Wrapper
15.3.5 The SVM Light Wrapper
15.3.6 Example Configuration File
16 Tools for Alignment Tasks
16.1 Introduction
16.2 The Tools
16.2.1 Compound Document
16.2.2 CompoundDocumentFromXml
16.2.3 Compound Document Editor
16.2.4 Composite Document
16.2.5 DeleteMembersPR
16.2.6 SwitchMembersPR
16.2.7 Saving as XML
16.2.8 Alignment Editor
16.2.9 Saving Files and Alignments
16.2.10 Section-by-Section Processing
17 Parsers and Taggers
17.1 Verb Group Chunker
17.2 Noun Phrase Chunker
17.2.1 Differences from the Original
17.2.2 Using the Chunker
17.3 Tree Tagger
17.3.1 POS Tags
17.4 TaggerFramework
17.5 Chemistry Tagger
17.5.1 Using the Tagger
17.6 ABNER
17.7 Stemmer
17.7.1 Algorithms
17.8 GATE Morphological Analyzer
17.8.1 Rule File
17.9 MiniPar Parser
17.9.1 Platform Supported
17.9.2 Resources
17.9.3 Parameters
17.9.4 Prerequisites
17.9.5 Grammatical Relationships
17.10 RASP Parser
17.11 SUPPLE Parser
17.11.1 Requirements
17.11.2 Building SUPPLE
17.11.3 Running the Parser in GATE
17.11.4 Viewing the Parse Tree
17.11.5 System Properties
17.11.6 Configuration Files
17.11.7 Parser and Grammar
17.11.8 Mapping Named Entities
17.11.9 Upgrading from BuChart to SUPPLE
17.12 Stanford Parser
17.12.1 Input Requirements
17.12.2 Initialization Parameters
17.12.3 Runtime Parameters
17.13 OpenCalais, LingPipe and OpenNLP
18 Combining GATE and UIMA
18.1 Embedding a UIMA AE in GATE
18.1.1 Mapping File Format
18.1.2 The UIMA Component Descriptor
18.1.3 Using the AnalysisEnginePR
18.2 Embedding a GATE CorpusController in UIMA
18.2.1 Mapping File Format
18.2.2 The GATE Application Definition
18.2.3 Configuring the GATEApplicationAnnotator
19 More (CREOLE) Plugins
19.1 Language Plugins
19.1.1 French Plugin
19.1.2 German Plugin
19.1.3 Romanian Plugin
19.1.4 Arabic Plugin
19.1.5 Chinese Plugin
19.1.6 Hindi Plugin
19.2 Flexible Exporter
19.3 Annotation Set Transfer
19.4 Information Retrieval in GATE
19.4.1 Using the IR Functionality in GATE
19.4.2 Using the IR API
19.5 Websphinx Web Crawler
19.5.1 Using the Crawler PR
19.6 Google Plugin
19.7 Yahoo Plugin
19.7.1 Using the YahooPR
19.8 Google Translator PR
19.9 WordNet in GATE
19.9.1 The WordNet API
19.10 Kea - Automatic Keyphrase Detection
19.10.1 Using the ‘KEA Keyphrase Extractor’ PR
19.10.2 Using Kea Corpora
19.11 Ontotext JapeC Compiler
19.12 Annotation Merging Plugin
19.13 Chinese Word Segmentation
19.14 Copying Annotations between Documents
19.15 OpenCalais Plugin
19.16 LingPipe Plugin
19.16.1 LingPipe Tokenizer PR
19.16.2 LingPipe Sentence Splitter PR
19.16.3 LingPipe POS Tagger PR
19.16.4 LingPipe NER PR
19.16.5 LingPipe Language Identifier PR
19.17 OpenNLP Plugin
19.17.1 Parameters common to all PRs
19.17.2 OpenNLP PRs
19.17.3 Training new models
19.18 Tagger_MetaMap Plugin
19.18.1 Parameters
19.19 Inter Annotator Agreement
19.20 Balanced Distance Metric Computation
19.21 Schema Annotation Editor
Appendices
A Change Log
A.1 Version 6.0-beta1 (August 2010)
A.1.1 Major new features
A.1.2 Breaking changes
A.1.3 Other new features and bugfixes
A.2 Version 5.2.1 (May 2010)
A.3 Version 5.2 (April 2010)
A.3.1 JAPE and JAPE-related
A.3.2 Other Changes
A.4 Version 5.1 (December 2009)
A.4.1 New Features
A.4.2 JAPE improvements
A.4.3 Other improvements and bug fixes
A.5 Version 5.0 (May 2009)
A.5.1 Major New Features
A.5.2 Other New Features and Improvements
A.5.3 Specific Bug Fixes
A.6 Version 4.0 (July 2007)
A.6.1 Major New Features
A.6.2 Other New Features and Improvements
A.6.3 Bug Fixes and Optimizations
A.7 Version 3.1 (April 2006)
A.7.1 Major New Features
A.7.2 Other New Features and Improvements
A.7.3 Bug Fixes
A.8 January 2005
A.9 December 2004
A.10 September 2004
A.11 Version 3 Beta 1 (August 2004)
A.12 July 2004
A.13 June 2004
A.14 April 2004
A.15 March 2004
A.16 Version 2.2 – August 2003
A.17 Version 2.1 – February 2003
A.18 June 2002
B Version 5.1 Plugins Name Map
C Design Notes
C.1 Patterns
C.1.1 Components
C.1.2 Model, view, controller
C.1.3 Interfaces
C.2 Exception Handling
D JAPE: Implementation
D.1 Formal Description of the JAPE Grammar
D.2 Relation to CPSL
D.3 Initialisation of a JAPE Grammar
D.4 Execution of JAPE Grammars
D.5 Using a Different Java Compiler
E Ant Tasks for GATE
E.1 Declaring the Tasks
E.2 The packagegapp task - bundling an application with its dependencies
E.2.1 Introduction
E.2.2 Basic Usage
E.2.3 Handling Non-Plugin Resources
E.2.4 Streamlining your Plugins
E.2.5 Bundling Extra Resources
E.3 The expandcreoles Task - Merging Annotation-Driven Config into creole.xml
F Named-Entity State Machine Patterns
F.1 Main.jape
F.2 first.jape
F.3 firstname.jape
F.4 name.jape
F.4.1 Person
F.4.2 Location
F.4.3 Organization
F.4.4 Ambiguities
F.4.5 Contextual information
F.5 name_post.jape
F.6 date_pre.jape
F.7 date.jape
F.8 reldate.jape
F.9 number.jape
F.10 address.jape
F.11 url.jape
F.12 identifier.jape
F.13 jobtitle.jape
F.14 final.jape
F.15 unknown.jape
F.16 name_context.jape
F.17 org_context.jape
F.18 loc_context.jape
F.19 clean.jape
G Part-of-Speech Tags used in the Hepple Tagger
References
1 Introduction
1.1 How to Use this Text
1.2 Context
1.3 Overview
1.3.1 Developing and Deploying Language Processing Facilities
1.3.2 Built-In Components
1.3.3 Additional Facilities
1.3.4 An Example
1.4 Some Evaluations
1.5 Changes in this Version
1.5.1 Version 6.0-beta1 (August 2010)
1.6 Further Reading
2 Installing and Running GATE
2.1 Downloading GATE
2.2 Installing and Running GATE
2.2.1 The Easy Way
2.2.2 The Hard Way (1)
2.2.3 The Hard Way (2): Subversion
2.3 Using System Properties with GATE
2.4 Configuring GATE
2.5 Building GATE
2.5.1 Using GATE with Maven
2.6 Uninstalling GATE
2.7 Troubleshooting
2.7.1 I don’t see the Java console messages under Windows
2.7.2 When I execute GATE, nothing happens
2.7.3 On Ubuntu, GATE is very slow or doesn’t start
2.7.4 How to use GATE on a 64 bit system?
2.7.5 I got the error: Could not reserve enough space for object heap
2.7.6 From Eclipse, I got the error: java.lang.OutOfMemoryError: Java heap space
2.7.7 On MacOS, I got the error: java.lang.OutOfMemoryError: Java heap space
2.7.8 I got the error: log4j:WARN No appenders could be found for logger...
2.7.9 Text is incorrectly refreshed after scrolling and become unreadable
2.7.10 An error occurred when running the TreeTagger plugin
2.7.11 I got the error: HighlightData cannot be cast to ...HighlightInfo
3 Using GATE Developer
3.1 The GATE Developer Main Window
3.2 Loading and Viewing Documents
3.3 Creating and Viewing Corpora
3.4 Working with Annotations
3.4.1 The Annotation Sets View
3.4.2 The Annotations List View
3.4.3 The Annotations Stack View
3.4.4 The Co-reference Editor
3.4.5 Creating and Editing Annotations
3.4.6 Schema-Driven Editing
3.4.7 Printing Text with Annotations
3.5 Using CREOLE Plugins
3.6 Loading and Using Processing Resources
3.7 Creating and Running an Application
3.7.1 Running an Application on a Datastore
3.7.2 Running PRs Conditionally on Document Features
3.7.3 Doing Information Extraction with ANNIE
3.7.4 Modifying ANNIE
3.8 Saving Applications and Language Resources
3.8.1 Saving Documents to File
3.8.2 Saving and Restoring LRs in Datastores
3.8.3 Saving Application States to a File
3.8.4 Saving an Application with its Resources (e.g. GATE Teamware)
3.9 Keyboard Shortcuts
3.10 Miscellaneous
3.10.1 Stopping GATE from Restoring Developer Sessions/Options
3.10.2 Working with Unicode
4 CREOLE: the GATE Component Model
4.1 The Web and CREOLE
4.2 The GATE Framework
4.3 The Lifecycle of a CREOLE Resource
4.4 Processing Resources and Applications
4.5 Language Resources and Datastores
4.6 Built-in CREOLE Resources
4.7 CREOLE Resource Configuration
4.7.1 Configuration with XML
4.7.2 Configuring Resources using Annotations
4.7.3 Mixing the Configuration Styles
4.8 Tools: How to Add Utilities to GATE Developer
4.8.1 Putting your tools in a sub-menu
5 Language Resources: Corpora, Documents and Annotations
5.1 Features: Simple Attribute/Value Data
5.2 Corpora: Sets of Documents plus Features
5.3 Documents: Content plus Annotations plus Features
5.4 Annotations: Directed Acyclic Graphs
5.4.1 Annotation Schemas
5.4.2 Examples of Annotated Documents
5.4.3 Creating, Viewing and Editing Diverse Annotation Types
5.5 Document Formats
5.5.1 Detecting the Right Reader
5.5.2 XML
5.5.3 HTML
5.5.4 SGML
5.5.5 Plain text
5.5.6 RTF
5.5.7 Email
5.6 XML Input/Output
6 ANNIE: a Nearly-New Information Extraction System
6.1 Document Reset
6.2 Tokeniser
6.2.1 Tokeniser Rules
6.2.2 Token Types
6.2.3 English Tokeniser
6.3 Gazetteer
6.4 Sentence Splitter
6.5 RegEx Sentence Splitter
6.6 Part of Speech Tagger
6.7 Semantic Tagger
6.8 Orthographic Coreference (OrthoMatcher)
6.8.1 GATE Interface
6.8.2 Resources
6.8.3 Processing
6.9 Pronominal Coreference
6.9.1 Quoted Speech Submodule
6.9.2 Pleonastic It Submodule
6.9.3 Pronominal Resolution Submodule
6.9.4 Detailed Description of the Algorithm
6.10 A Walk-Through Example
6.10.1 Step 1 - Tokenisation
6.10.2 Step 2 - List Lookup
6.10.3 Step 3 - Grammar Rules
II GATE for Advanced Users
7 GATE Embedded
7.1 Quick Start with GATE Embedded
7.2 Resource Management in GATE Embedded
7.3 Using CREOLE Plugins
7.4 Language Resources
7.4.1 GATE Documents
7.4.2 Feature Maps
7.4.3 Annotation Sets
7.4.4 Annotations
7.4.5 GATE Corpora
7.5 Processing Resources
7.6 Controllers
7.7 Duplicating a Resource
7.8 Persistent Applications
7.9 Ontologies
7.10 Creating a New Annotation Schema
7.11 Creating a New CREOLE Resource
7.12 Adding Support for a New Document Format
7.13 Using GATE Embedded in a Multithreaded Environment
7.14 Using GATE Embedded within a Spring Application
7.14.1 Duplication in Spring
7.14.2 Spring pooling
7.14.3 Further reading
7.15 Using GATE Embedded within a Tomcat Web Application
7.15.1 Recommended Directory Structure
7.15.2 Configuration Files
7.15.3 Initialization Code
7.16 Groovy for GATE
7.16.1 Groovy Scripting Console for GATE
7.16.2 Groovy scripting PR
7.16.3 The Scriptable Controller
7.16.4 Utility methods
7.17 Saving Config Data to gate.xml
7.18 Annotation merging through the API
8 JAPE: Regular Expressions over Annotations
8.1 The Left-Hand Side
8.1.1 Matching a Simple Text String
8.1.2 Matching Entire Annotation Types
8.1.3 Using Attributes and Values
8.1.4 Using Meta-Properties
8.1.5 Using Templates
8.1.6 Multiple Pattern/Action Pairs
8.1.7 LHS Macros
8.1.8 Using Context
8.1.9 Multi-Constraint Statements
8.1.10 Negation
8.1.11 Escaping Special Characters
8.2 LHS Operators in Detail
8.2.1 Compositional Operators
8.2.2 Matching Operators
8.3 The Right-Hand Side
8.3.1 A Simple Example
8.3.2 Copying Feature Values from the LHS to the RHS
8.3.3 RHS Macros
8.4 Use of Priority
8.5 Using Phases Sequentially
8.6 Using Java Code on the RHS
8.6.1 A More Complex Example
8.6.2 Adding a Feature to the Document
8.6.3 Finding the Tokens of a Matched Annotation
8.6.4 Using Named Blocks
8.6.5 Java RHS Overview
8.7 Optimising for Speed
8.8 Ontology Aware Grammar Transduction
8.9 Serializing JAPE Transducer
8.9.1 How to Serialize?
8.9.2 How to Use the Serialized Grammar File?
8.10 The JAPE Debugger
8.11 Notes for Montreal Transducer Users
9 ANNIC: ANNotations-In-Context
9.1 Instantiating SSD
9.2 Search GUI
9.2.1 Overview
9.2.2 Syntax of Queries
9.2.3 Top Section
9.2.4 Central Section
9.2.5 Bottom Section
9.3 Using SSD from GATE Embedded
9.3.1 How to instantiate a searchabledatastore
9.3.2 How to search in this datastore
10 Performance Evaluation of Language Analysers
10.1 Metrics for Evaluation in Information Extraction
10.1.1 Annotation Relations
10.1.2 Cohen’s Kappa
10.1.3 Precision, Recall, F-Measure
10.1.4 Macro and Micro Averaging
10.2 The Annotation Diff Tool
10.2.1 Performing Evaluation with the Annotation Diff Tool
10.3 Corpus Quality Assurance
10.3.1 Description of the interface
10.3.2 Step by step usage
10.3.3 Details of the Corpus statistics table
10.3.4 Details of the Document statistics table
10.3.5 GATE Embedded API for the measures
10.3.6 Quality Assurance PR
10.4 Corpus Benchmark Tool
10.4.1 Preparing the Corpora for Use
10.4.2 Defining Properties
10.4.3 Running the Tool
10.4.4 The Results
10.5 A Plugin Computing Inter-Annotator Agreement (IAA)
10.5.1 IAA for Classification
10.5.2 IAA For Named Entity Annotation
10.5.3 The BDM-Based IAA Scores
10.6 A Plugin Computing the BDM Scores for an Ontology
11 Profiling Processing Resources
11.1 Overview
11.1.1 Features
11.1.2 Limitations
11.2 Graphical User Interface
11.3 Command Line Interface
11.4 Application Programming Interface
11.4.1 Log4j.properties
11.4.2 Benchmark log format
11.4.3 Enabling profiling
11.4.4 Reporting tool
12 Developing GATE
12.1 Reporting Bugs and Requesting Features
12.2 Contributing Patches
12.3 Creating New Plugins
12.3.1 Where to Keep Plugins in the GATE Hierarchy
12.3.2 What to Call your Plugin
12.3.3 Writing a New PR
12.3.4 Writing a New VR
12.3.5 Adding Plugins to the Nightly Build
12.4 Updating this User Guide
12.4.1 Building the User Guide
12.4.2 Making Changes to the User Guide
III CREOLE Plugins
13 Gazetteers
13.1 Introduction to Gazetteers
13.2 ANNIE Gazetteer
13.2.1 Creating and Modifying Gazetteer Lists
13.2.2 ANNIE Gazetteer Editor
13.3 Gazetteer Visual Resource - GAZE
13.3.1 Display Modes
13.3.2 Linear Definition Pane
13.3.3 Linear Definition Toolbar
13.3.4 Operations on Linear Definition Nodes
13.3.5 Gazetteer List Pane
13.3.6 Mapping Definition Pane
13.4 OntoGazetteer
13.5 Gaze Ontology Gazetteer Editor
13.5.1 The Gaze Gazetteer List and Mapping Editor
13.5.2 The Gaze Ontology Editor
13.6 Hash Gazetteer
13.6.1 Prerequisites
13.6.2 Parameters
13.7 Flexible Gazetteer
13.8 Gazetteer List Collector
13.9 OntoRoot Gazetteer
13.9.1 How Does it Work?
13.9.2 Initialisation of OntoRoot Gazetteer
13.9.3 Simple steps to run OntoRoot Gazetteer
13.10 Large KB Gazetteer
13.10.1 Quick usage overview
13.10.2 Dictionary setup
13.10.3 Additional dictionary configuration
13.10.4 Processing Resource Configuration
13.10.5 Runtime configuration
13.10.6 Semantic Enrichment PR
13.11 The Shared Gazetteer for multithreaded processing
14 Working with Ontologies
14.1 Data Model for Ontologies
14.1.1 Hierarchies of Classes and Restrictions
14.1.2 Instances
14.1.3 Hierarchies of Properties
14.1.4 URIs
14.2 Ontology Event Model
14.2.1 What Happens when a Resource is Deleted?
14.3 The Ontology Plugin: Current Implementation
14.3.1 The OWLIMOntology Language Resource
14.3.2 The ConnectSesameOntology Language Resource
14.3.3 The CreateSesameOntology Language Resource
14.3.4 The OWLIM2 Backwards-Compatible Language Resource
14.3.5 Using Ontology Import Mappings
14.3.6 Using BigOWLIM
14.4 The Ontology_OWLIM2 plugin: backwards-compatible implementation
14.4.1 The OWLIMOntologyLR Language Resource
14.5 GATE Ontology Editor
14.6 Ontology Annotation Tool
14.6.1 Viewing Annotated Text
14.6.2 Editing Existing Annotations
14.6.3 Adding New Annotations
14.6.4 Options
14.7 Relation Annotation Tool
14.7.1 Description of the two views
14.7.2 New annotation and instance from text selection
14.7.3 New annotation and add label to existing instance from text selection
14.7.4 Create and set properties for annotation relation
14.7.5 Delete instance, label or property
14.7.6 Differences with OAT and Ontology Editor
14.8 Using the ontology API
14.9 Using the ontology API (old version)
14.10 Ontology-Aware JAPE Transducer
14.11 Annotating Text with Ontological Information
14.12 Populating Ontologies
14.13 Ontology API and Implementation Changes
14.13.1 Differences between the implementation plugins
14.13.2 Changes in the Ontology API
15 Machine Learning
15.1 ML Generalities
15.1.1 Some Definitions
15.1.2 GATE-Specific Interpretation of the Above Definitions
15.2 Batch Learning PR
15.2.1 Batch Learning PR Configuration File Settings
15.2.2 Case Studies for the Three Learning Types
15.2.3 How to Use the Batch Learning PR in GATE Developer
15.2.4 Output of the Batch Learning PR
15.2.5 Using the Batch Learning PR from the API
15.3 Machine Learning PR
15.3.1 The DATASET Element
15.3.2 The ENGINE Element
15.3.3 The WEKA Wrapper
15.3.4 The MAXENT Wrapper
15.3.5 The SVM Light Wrapper
15.3.6 Example Configuration File
16 Tools for Alignment Tasks
16.1 Introduction
16.2 The Tools
16.2.1 Compound Document
16.2.2 CompoundDocumentFromXml
16.2.3 Compound Document Editor
16.2.4 Composite Document
16.2.5 DeleteMembersPR
16.2.6 SwitchMembersPR
16.2.7 Saving as XML
16.2.8 Alignment Editor
16.2.9 Saving Files and Alignments
16.2.10 Section-by-Section Processing
17 Parsers and Taggers
17.1 Verb Group Chunker
17.2 Noun Phrase Chunker
17.2.1 Differences from the Original
17.2.2 Using the Chunker
17.3 Tree Tagger
17.3.1 POS Tags
17.4 TaggerFramework
17.5 Chemistry Tagger
17.5.1 Using the Tagger
17.6 ABNER
17.7 Stemmer
17.7.1 Algorithms
17.8 GATE Morphological Analyzer
17.8.1 Rule File
17.9 MiniPar Parser
17.9.1 Platform Supported
17.9.2 Resources
17.9.3 Parameters
17.9.4 Prerequisites
17.9.5 Grammatical Relationships
17.10 RASP Parser
17.11 SUPPLE Parser
17.11.1 Requirements
17.11.2 Building SUPPLE
17.11.3 Running the Parser in GATE
17.11.4 Viewing the Parse Tree
17.11.5 System Properties
17.11.6 Configuration Files
17.11.7 Parser and Grammar
17.11.8 Mapping Named Entities
17.11.9 Upgrading from BuChart to SUPPLE
17.12 Stanford Parser
17.12.1 Input Requirements
17.12.2 Initialization Parameters
17.12.3 Runtime Parameters
17.13 OpenCalais, LingPipe and OpenNLP
18 Combining GATE and UIMA
18.1 Embedding a UIMA AE in GATE
18.1.1 Mapping File Format
18.1.2 The UIMA Component Descriptor
18.1.3 Using the AnalysisEnginePR
18.2 Embedding a GATE CorpusController in UIMA
18.2.1 Mapping File Format
18.2.2 The GATE Application Definition
18.2.3 Configuring the GATEApplicationAnnotator
19 More (CREOLE) Plugins
19.1 Language Plugins
19.1.1 French Plugin
19.1.2 German Plugin
19.1.3 Romanian Plugin
19.1.4 Arabic Plugin
19.1.5 Chinese Plugin
19.1.6 Hindi Plugin
19.2 Flexible Exporter
19.3 Annotation Set Transfer
19.4 Information Retrieval in GATE
19.4.1 Using the IR Functionality in GATE
19.4.2 Using the IR API
19.5 Websphinx Web Crawler
19.5.1 Using the Crawler PR
19.6 Google Plugin
19.7 Yahoo Plugin
19.7.1 Using the YahooPR
19.8 Google Translator PR
19.9 WordNet in GATE
19.9.1 The WordNet API
19.10 Kea - Automatic Keyphrase Detection
19.10.1 Using the ‘KEA Keyphrase Extractor’ PR
19.10.2 Using Kea Corpora
19.11 Ontotext JapeC Compiler
19.12 Annotation Merging Plugin
19.13 Chinese Word Segmentation
19.14 Copying Annotations between Documents
19.15 OpenCalais Plugin
19.16 LingPipe Plugin
19.16.1 LingPipe Tokenizer PR
19.16.2 LingPipe Sentence Splitter PR
19.16.3 LingPipe POS Tagger PR
19.16.4 LingPipe NER PR
19.16.5 LingPipe Language Identifier PR
19.17 OpenNLP Plugin
19.17.1 Parameters common to all PRs
19.17.2 OpenNLP PRs
19.17.3 Training new models
19.18 Tagger_MetaMap Plugin
19.18.1 Parameters
19.19 Inter Annotator Agreement
19.20 Balanced Distance Metric Computation
19.21 Schema Annotation Editor
Appendices
A Change Log
A.1 Version 6.0-beta1 (August 2010)
A.1.1 Major new features
A.1.2 Breaking changes
A.1.3 Other new features and bugfixes
A.2 Version 5.2.1 (May 2010)
A.3 Version 5.2 (April 2010)
A.3.1 JAPE and JAPE-related
A.3.2 Other Changes
A.4 Version 5.1 (December 2009)
A.4.1 New Features
A.4.2 JAPE improvements
A.4.3 Other improvements and bug fixes
A.5 Version 5.0 (May 2009)
A.5.1 Major New Features
A.5.2 Other New Features and Improvements
A.5.3 Specific Bug Fixes
A.6 Version 4.0 (July 2007)
A.6.1 Major New Features
A.6.2 Other New Features and Improvements
A.6.3 Bug Fixes and Optimizations
A.7 Version 3.1 (April 2006)
A.7.1 Major New Features
A.7.2 Other New Features and Improvements
A.7.3 Bug Fixes
A.8 January 2005
A.9 December 2004
A.10 September 2004
A.11 Version 3 Beta 1 (August 2004)
A.12 July 2004
A.13 June 2004
A.14 April 2004
A.15 March 2004
A.16 Version 2.2 – August 2003
A.17 Version 2.1 – February 2003
A.18 June 2002
B Version 5.1 Plugins Name Map
C Design Notes
C.1 Patterns
C.1.1 Components
C.1.2 Model, view, controller
C.1.3 Interfaces
C.2 Exception Handling
D JAPE: Implementation
D.1 Formal Description of the JAPE Grammar
D.2 Relation to CPSL
D.3 Initialisation of a JAPE Grammar
D.4 Execution of JAPE Grammars
D.5 Using a Different Java Compiler
E Ant Tasks for GATE
E.1 Declaring the Tasks
E.2 The packagegapp task - bundling an application with its dependencies
E.2.1 Introduction
E.2.2 Basic Usage
E.2.3 Handling Non-Plugin Resources
E.2.4 Streamlining your Plugins
E.2.5 Bundling Extra Resources
E.3 The expandcreoles Task - Merging Annotation-Driven Config into creole.xml
F Named-Entity State Machine Patterns
F.1 Main.jape
F.2 first.jape
F.3 firstname.jape
F.4 name.jape
F.4.1 Person
F.4.2 Location
F.4.3 Organization
F.4.4 Ambiguities
F.4.5 Contextual information
F.5 name_post.jape
F.6 date_pre.jape
F.7 date.jape
F.8 reldate.jape
F.9 number.jape
F.10 address.jape
F.11 url.jape
F.12 identifier.jape
F.13 jobtitle.jape
F.14 final.jape
F.15 unknown.jape
F.16 name_context.jape
F.17 org_context.jape
F.18 loc_context.jape
F.19 clean.jape
G Part-of-Speech Tags used in the Hepple Tagger
References