Log in Help
Print
Homereleasesgate-6.1-build3913-ALLdoctao 〉 splitli1.html
 

Contents

I  GATE Basics
1 Introduction
 1.1 How to Use this Text
 1.2 Context
 1.3 Overview
  1.3.1 Developing and Deploying Language Processing Facilities
  1.3.2 Built-In Components
  1.3.3 Additional Facilities in GATE Developer/Embedded
  1.3.4 An Example
 1.4 Some Evaluations
 1.5 Recent Changes
  1.5.1 Version 6.1 (April 2011)
 1.6 Further Reading
2 Installing and Running GATE
 2.1 Downloading GATE
 2.2 Installing and Running GATE
  2.2.1 The Easy Way
  2.2.2 The Hard Way (1)
  2.2.3 The Hard Way (2): Subversion
  2.2.4 Running GATE Developer on Unix/Linux
 2.3 Using System Properties with GATE
 2.4 Configuring GATE
 2.5 Building GATE
  2.5.1 Using GATE with Maven/Ivy
 2.6 Uninstalling GATE
 2.7 Troubleshooting
  2.7.1 I don’t see the Java console messages under Windows
  2.7.2 When I execute GATE, nothing happens
  2.7.3 On Ubuntu, GATE is very slow or doesn’t start
  2.7.4 How to use GATE on a 64 bit system?
  2.7.5 I got the error: Could not reserve enough space for object heap
  2.7.6 From Eclipse, I got the error: java.lang.OutOfMemoryError: Java heap space
  2.7.7 On MacOS, I got the error: java.lang.OutOfMemoryError: Java heap space
  2.7.8 I got the error: log4j:WARN No appenders could be found for logger...
  2.7.9 Text is incorrectly refreshed after scrolling and become unreadable
  2.7.10 An error occurred when running the TreeTagger plugin
  2.7.11 I got the error: HighlightData cannot be cast to ...HighlightInfo
3 Using GATE Developer
 3.1 The GATE Developer Main Window
 3.2 Loading and Viewing Documents
 3.3 Creating and Viewing Corpora
 3.4 Working with Annotations
  3.4.1 The Annotation Sets View
  3.4.2 The Annotations List View
  3.4.3 The Annotations Stack View
  3.4.4 The Co-reference Editor
  3.4.5 Creating and Editing Annotations
  3.4.6 Schema-Driven Editing
  3.4.7 Printing Text with Annotations
 3.5 Using CREOLE Plugins
 3.6 Loading and Using Processing Resources
 3.7 Creating and Running an Application
  3.7.1 Running an Application on a Datastore
  3.7.2 Running PRs Conditionally on Document Features
  3.7.3 Doing Information Extraction with ANNIE
  3.7.4 Modifying ANNIE
 3.8 Saving Applications and Language Resources
  3.8.1 Saving Documents to File
  3.8.2 Saving and Restoring LRs in Datastores
  3.8.3 Saving Application States to a File
  3.8.4 Saving an Application with its Resources (e.g. GATECloud.net)
 3.9 Keyboard Shortcuts
 3.10 Miscellaneous
  3.10.1 Stopping GATE from Restoring Developer Sessions/Options
  3.10.2 Working with Unicode
4 CREOLE: the GATE Component Model
 4.1 The Web and CREOLE
 4.2 The GATE Framework
 4.3 The Lifecycle of a CREOLE Resource
 4.4 Processing Resources and Applications
 4.5 Language Resources and Datastores
 4.6 Built-in CREOLE Resources
 4.7 CREOLE Resource Configuration
  4.7.1 Configuration with XML
  4.7.2 Configuring Resources using Annotations
  4.7.3 Mixing the Configuration Styles
 4.8 Tools: How to Add Utilities to GATE Developer
  4.8.1 Putting your tools in a sub-menu
5 Language Resources: Corpora, Documents and Annotations
 5.1 Features: Simple Attribute/Value Data
 5.2 Corpora: Sets of Documents plus Features
 5.3 Documents: Content plus Annotations plus Features
 5.4 Annotations: Directed Acyclic Graphs
  5.4.1 Annotation Schemas
  5.4.2 Examples of Annotated Documents
  5.4.3 Creating, Viewing and Editing Diverse Annotation Types
 5.5 Document Formats
  5.5.1 Detecting the Right Reader
  5.5.2 XML
  5.5.3 HTML
  5.5.4 SGML
  5.5.5 Plain text
  5.5.6 RTF
  5.5.7 Email
  5.5.8 PDF Files and Office Documents
 5.6 XML Input/Output
6 ANNIE: a Nearly-New Information Extraction System
 6.1 Document Reset
 6.2 Tokeniser
  6.2.1 Tokeniser Rules
  6.2.2 Token Types
  6.2.3 English Tokeniser
 6.3 Gazetteer
 6.4 Sentence Splitter
 6.5 RegEx Sentence Splitter
 6.6 Part of Speech Tagger
 6.7 Semantic Tagger
 6.8 Orthographic Coreference (OrthoMatcher)
  6.8.1 GATE Interface
  6.8.2 Resources
  6.8.3 Processing
 6.9 Pronominal Coreference
  6.9.1 Quoted Speech Submodule
  6.9.2 Pleonastic It Submodule
  6.9.3 Pronominal Resolution Submodule
  6.9.4 Detailed Description of the Algorithm
 6.10 A Walk-Through Example
  6.10.1 Step 1 - Tokenisation
  6.10.2 Step 2 - List Lookup
  6.10.3 Step 3 - Grammar Rules
II  GATE for Advanced Users
7 GATE Embedded
 7.1 Quick Start with GATE Embedded
 7.2 Resource Management in GATE Embedded
 7.3 Using CREOLE Plugins
 7.4 Language Resources
  7.4.1 GATE Documents
  7.4.2 Feature Maps
  7.4.3 Annotation Sets
  7.4.4 Annotations
  7.4.5 GATE Corpora
 7.5 Processing Resources
 7.6 Controllers
 7.7 Duplicating a Resource
 7.8 Persistent Applications
 7.9 Ontologies
 7.10 Creating a New Annotation Schema
 7.11 Creating a New CREOLE Resource
 7.12 Adding Support for a New Document Format
 7.13 Using GATE Embedded in a Multithreaded Environment
 7.14 Using GATE Embedded within a Spring Application
  7.14.1 Duplication in Spring
  7.14.2 Spring pooling
  7.14.3 Further reading
 7.15 Using GATE Embedded within a Tomcat Web Application
  7.15.1 Recommended Directory Structure
  7.15.2 Configuration Files
  7.15.3 Initialization Code
 7.16 Groovy for GATE
  7.16.1 Groovy Scripting Console for GATE
  7.16.2 Groovy scripting PR
  7.16.3 The Scriptable Controller
  7.16.4 Utility methods
 7.17 Saving Config Data to gate.xml
 7.18 Annotation merging through the API
8 JAPE: Regular Expressions over Annotations
 8.1 The Left-Hand Side
  8.1.1 Matching a Simple Text String
  8.1.2 Matching Entire Annotation Types
  8.1.3 Using Attributes and Values
  8.1.4 Using Meta-Properties
  8.1.5 Using Templates
  8.1.6 Multiple Pattern/Action Pairs
  8.1.7 LHS Macros
  8.1.8 Using Context
  8.1.9 Multi-Constraint Statements
  8.1.10 Negation
  8.1.11 Escaping Special Characters
 8.2 LHS Operators in Detail
  8.2.1 Compositional Operators
  8.2.2 Matching Operators
 8.3 The Right-Hand Side
  8.3.1 A Simple Example
  8.3.2 Copying Feature Values from the LHS to the RHS
  8.3.3 RHS Macros
 8.4 Use of Priority
 8.5 Using Phases Sequentially
 8.6 Using Java Code on the RHS
  8.6.1 A More Complex Example
  8.6.2 Adding a Feature to the Document
  8.6.3 Finding the Tokens of a Matched Annotation
  8.6.4 Using Named Blocks
  8.6.5 Java RHS Overview
 8.7 Optimising for Speed
 8.8 Ontology Aware Grammar Transduction
 8.9 Serializing JAPE Transducer
  8.9.1 How to Serialize?
  8.9.2 How to Use the Serialized Grammar File?
 8.10 Notes for Montreal Transducer Users
9 ANNIC: ANNotations-In-Context
 9.1 Instantiating SSD
 9.2 Search GUI
  9.2.1 Overview
  9.2.2 Syntax of Queries
  9.2.3 Top Section
  9.2.4 Central Section
  9.2.5 Bottom Section
 9.3 Using SSD from GATE Embedded
  9.3.1 How to instantiate a searchabledatastore
  9.3.2 How to search in this datastore
10 Performance Evaluation of Language Analysers
 10.1 Metrics for Evaluation in Information Extraction
  10.1.1 Annotation Relations
  10.1.2 Cohen’s Kappa
  10.1.3 Precision, Recall, F-Measure
  10.1.4 Macro and Micro Averaging
 10.2 The Annotation Diff Tool
  10.2.1 Performing Evaluation with the Annotation Diff Tool
  10.2.2 Creating a Gold Standard with the Annotation Diff Tool
 10.3 Corpus Quality Assurance
  10.3.1 Description of the interface
  10.3.2 Step by step usage
  10.3.3 Details of the Corpus statistics table
  10.3.4 Details of the Document statistics table
  10.3.5 GATE Embedded API for the measures
  10.3.6 sec:eval:qapr
 10.4 Corpus Benchmark Tool
  10.4.1 Preparing the Corpora for Use
  10.4.2 Defining Properties
  10.4.3 Running the Tool
  10.4.4 The Results
 10.5 A Plugin Computing Inter-Annotator Agreement (IAA)
  10.5.1 IAA for Classification
  10.5.2 IAA For Named Entity Annotation
  10.5.3 The BDM-Based IAA Scores
 10.6 A Plugin Computing the BDM Scores for an Ontology
 10.7 Quality Assurance Summariser for Teamware
11 Profiling Processing Resources
 11.1 Overview
  11.1.1 Features
  11.1.2 Limitations
 11.2 Graphical User Interface
 11.3 Command Line Interface
 11.4 Application Programming Interface
  11.4.1 Log4j.properties
  11.4.2 Benchmark log format
  11.4.3 Enabling profiling
  11.4.4 Reporting tool
12 Developing GATE
 12.1 Reporting Bugs and Requesting Features
 12.2 Contributing Patches
 12.3 Creating New Plugins
  12.3.1 Where to Keep Plugins in the GATE Hierarchy
  12.3.2 What to Call your Plugin
  12.3.3 Writing a New PR
  12.3.4 Writing a New VR
  12.3.5 Adding Plugins to the Nightly Build
 12.4 Updating this User Guide
  12.4.1 Building the User Guide
  12.4.2 Making Changes to the User Guide
III  CREOLE Plugins
13 Gazetteers
 13.1 Introduction to Gazetteers
 13.2 ANNIE Gazetteer
  13.2.1 Creating and Modifying Gazetteer Lists
  13.2.2 ANNIE Gazetteer Editor
 13.3 Gazetteer Visual Resource - GAZE
  13.3.1 Display Modes
  13.3.2 Linear Definition Pane
  13.3.3 Linear Definition Toolbar
  13.3.4 Operations on Linear Definition Nodes
  13.3.5 Gazetteer List Pane
  13.3.6 Mapping Definition Pane
 13.4 OntoGazetteer
 13.5 Gaze Ontology Gazetteer Editor
  13.5.1 The Gaze Gazetteer List and Mapping Editor
  13.5.2 The Gaze Ontology Editor
 13.6 Hash Gazetteer
  13.6.1 Prerequisites
  13.6.2 Parameters
 13.7 Flexible Gazetteer
 13.8 Gazetteer List Collector
 13.9 OntoRoot Gazetteer
  13.9.1 How Does it Work?
  13.9.2 Initialisation of OntoRoot Gazetteer
  13.9.3 Simple steps to run OntoRoot Gazetteer
 13.10 Large KB Gazetteer
  13.10.1 Quick usage overview
  13.10.2 Dictionary setup
  13.10.3 Additional dictionary configuration
  13.10.4 Processing Resource Configuration
  13.10.5 Runtime configuration
  13.10.6 Semantic Enrichment PR
 13.11 The Shared Gazetteer for multithreaded processing
14 Working with Ontologies
 14.1 Data Model for Ontologies
  14.1.1 Hierarchies of Classes and Restrictions
  14.1.2 Instances
  14.1.3 Hierarchies of Properties
  14.1.4 URIs
 14.2 Ontology Event Model
  14.2.1 What Happens when a Resource is Deleted?
 14.3 The Ontology Plugin: Current Implementation
  14.3.1 The OWLIMOntology Language Resource
  14.3.2 The ConnectSesameOntology Language Resource
  14.3.3 The CreateSesameOntology Language Resource
  14.3.4 The OWLIM2 Backwards-Compatible Language Resource
  14.3.5 Using Ontology Import Mappings
  14.3.6 Using BigOWLIM
 14.4 The Ontology_OWLIM2 plugin: backwards-compatible implementation
  14.4.1 The OWLIMOntologyLR Language Resource
 14.5 GATE Ontology Editor
 14.6 Ontology Annotation Tool
  14.6.1 Viewing Annotated Text
  14.6.2 Editing Existing Annotations
  14.6.3 Adding New Annotations
  14.6.4 Options
 14.7 Relation Annotation Tool
  14.7.1 Description of the two views
  14.7.2 Create new annotation and instance from text selection
  14.7.3 Create new annotation and add label to existing instance from text selection
  14.7.4 Create and set properties for annotation relation
  14.7.5 Delete instance, label or property
  14.7.6 Differences with OAT and Ontology Editor
 14.8 Using the ontology API
 14.9 Using the ontology API (old version)
 14.10 Ontology-Aware JAPE Transducer
 14.11 Annotating Text with Ontological Information
 14.12 Populating Ontologies
 14.13 Ontology API and Implementation Changes
  14.13.1 Differences between the implementation plugins
  14.13.2 Changes in the Ontology API
15 Non-English Language Support
 15.1 French Plugin
 15.2 German Plugin
 15.3 Romanian Plugin
 15.4 Arabic Plugin
 15.5 Chinese Plugin
  15.5.1 Chinese Word Segmentation
 15.6 Hindi Plugin
16 Parsers
 16.1 MiniPar Parser
  16.1.1 Platform Supported
  16.1.2 Resources
  16.1.3 Parameters
  16.1.4 Prerequisites
  16.1.5 Grammatical Relationships
 16.2 RASP Parser
 16.3 SUPPLE Parser
  16.3.1 Requirements
  16.3.2 Building SUPPLE
  16.3.3 Running the Parser in GATE
  16.3.4 Viewing the Parse Tree
  16.3.5 System Properties
  16.3.6 Configuration Files
  16.3.7 Parser and Grammar
  16.3.8 Mapping Named Entities
  16.3.9 Upgrading from BuChart to SUPPLE
 16.4 Stanford Parser
  16.4.1 Input Requirements
  16.4.2 Initialization Parameters
  16.4.3 Runtime Parameters
17 Machine Learning
 17.1 ML Generalities
  17.1.1 Some Definitions
  17.1.2 GATE-Specific Interpretation of the Above Definitions
 17.2 Batch Learning PR
  17.2.1 Batch Learning PR Configuration File Settings
  17.2.2 Case Studies for the Three Learning Types
  17.2.3 How to Use the Batch Learning PR in GATE Developer
  17.2.4 Output of the Batch Learning PR
  17.2.5 Using the Batch Learning PR from the API
 17.3 Machine Learning PR
  17.3.1 The DATASET Element
  17.3.2 The ENGINE Element
  17.3.3 The WEKA Wrapper
  17.3.4 The MAXENT Wrapper
  17.3.5 The SVM Light Wrapper
  17.3.6 Example Configuration File
18 Tools for Alignment Tasks
 18.1 Introduction
 18.2 The Tools
  18.2.1 Compound Document
  18.2.2 CompoundDocumentFromXml
  18.2.3 Compound Document Editor
  18.2.4 Composite Document
  18.2.5 DeleteMembersPR
  18.2.6 SwitchMembersPR
  18.2.7 Saving as XML
  18.2.8 Alignment Editor
  18.2.9 Saving Files and Alignments
  18.2.10 Section-by-Section Processing
19 Combining GATE and UIMA
 19.1 Embedding a UIMA AE in GATE
  19.1.1 Mapping File Format
  19.1.2 The UIMA Component Descriptor
  19.1.3 Using the AnalysisEnginePR
 19.2 Embedding a GATE CorpusController in UIMA
  19.2.1 Mapping File Format
  19.2.2 The GATE Application Definition
  19.2.3 Configuring the GATEApplicationAnnotator
20 More (CREOLE) Plugins
 20.1 Verb Group Chunker
 20.2 Noun Phrase Chunker
  20.2.1 Differences from the Original
  20.2.2 Using the Chunker
 20.3 TaggerFramework
  20.3.1 TreeTagger - Multilingual POS Tagger
 20.4 Chemistry Tagger
  20.4.1 Using the Tagger
 20.5 Annotating Numbers
  20.5.1 Numbers in Words and Numbers
  20.5.2 Roman Numerals
 20.6 Annotating Measurements
 20.7 Annotating and Normalizing Dates
 20.8 ABNER - A Biomedical Named Entity Recogniser
 20.9 Snowball Based Stemmers
  20.9.1 Algorithms
 20.10 GATE Morphological Analyzer
  20.10.1 Rule File
 20.11 Flexible Exporter
 20.12 Annotation Set Transfer
 20.13 Schema Enforcer
 20.14 Information Retrieval in GATE
  20.14.1 Using the IR Functionality in GATE
  20.14.2 Using the IR API
 20.15 Websphinx Web Crawler
  20.15.1 Using the Crawler PR
 20.16 Google Plugin
 20.17 Yahoo Plugin
  20.17.1 Using the YahooPR
 20.18 Google Translator PR
 20.19 WordNet in GATE
  20.19.1 The WordNet API
 20.20 Kea - Automatic Keyphrase Detection
  20.20.1 Using the ‘KEA Keyphrase Extractor’ PR
  20.20.2 Using Kea Corpora
 20.21 Ontotext JapeC Compiler
 20.22 Annotation Merging Plugin
 20.23 Copying Annotations between Documents
 20.24 OpenCalais Plugin
 20.25 LingPipe Plugin
  20.25.1 LingPipe Tokenizer PR
  20.25.2 LingPipe Sentence Splitter PR
  20.25.3 LingPipe POS Tagger PR
  20.25.4 LingPipe NER PR
  20.25.5 LingPipe Language Identifier PR
 20.26 OpenNLP Plugin
  20.26.1 Parameters common to all PRs
  20.26.2 OpenNLP PRs
  20.26.3 Training new models
 20.27 Tagger_MetaMap Plugin
  20.27.1 Run-time parameters
  20.27.2 Upgrading from an earlier version of the plugin
 20.28 Content Detection Using Boilerpipe
 20.29 Inter Annotator Agreement
 20.30 Schema Annotation Editor
IV  The GATE Family: Cloud, MIMIR, Teamware
21 GATE Cloud
 21.1 GATE Cloud services: an overview
 21.2 Comparison with other systems
 21.3 How to buy services
 21.4 Pricing and discounts
 21.5 Annotation Jobs on GATECloud.net
  21.5.1 The Annotation Service Charges Explained
  21.5.2 Annotation Job Execution in Detail
 21.6 Running Custom Annotation Jobs on GATECloud.net
  21.6.1 Preparing Your Application: The Basics
  21.6.2 The GATECloud.net environment
22 GATE Teamware: A Web-based Collaborative Corpus Annotation Tool
 22.1 Introduction
 22.2 Requirements for Multi-Role Collaborative Annotation Environments
  22.2.1 Typical Division of Labour
  22.2.2 Remote, Scalable Data Storage
  22.2.3 Automatic annotation services
  22.2.4 Workflow Support
 22.3 Teamware: Architecture, Implementation, and Examples
  22.3.1 Data Storage Service
  22.3.2 Annotation Services
  22.3.3 The Executive Layer
  22.3.4 The User Interfaces
 22.4 Practical Applications
23 GATE Mímir
Appendices
A Change Log
 A.1 Version 6.1 (April 2011)
  A.1.1 New CREOLE Plugins
  A.1.2 Other new features and improvements
 A.2 Version 6.0 (November 2010)
  A.2.1 Major new features
  A.2.2 Breaking changes
  A.2.3 Other new features and bugfixes
 A.3 Version 5.2.1 (May 2010)
 A.4 Version 5.2 (April 2010)
  A.4.1 JAPE and JAPE-related
  A.4.2 Other Changes
 A.5 Version 5.1 (December 2009)
  A.5.1 New Features
  A.5.2 JAPE improvements
  A.5.3 Other improvements and bug fixes
 A.6 Version 5.0 (May 2009)
  A.6.1 Major New Features
  A.6.2 Other New Features and Improvements
  A.6.3 Specific Bug Fixes
 A.7 Version 4.0 (July 2007)
  A.7.1 Major New Features
  A.7.2 Other New Features and Improvements
  A.7.3 Bug Fixes and Optimizations
 A.8 Version 3.1 (April 2006)
  A.8.1 Major New Features
  A.8.2 Other New Features and Improvements
  A.8.3 Bug Fixes
 A.9 January 2005
 A.10 December 2004
 A.11 September 2004
 A.12 Version 3 Beta 1 (August 2004)
 A.13 July 2004
 A.14 June 2004
 A.15 April 2004
 A.16 March 2004
 A.17 Version 2.2 – August 2003
 A.18 Version 2.1 – February 2003
 A.19 June 2002
B Version 5.1 Plugins Name Map
C Design Notes
 C.1 Patterns
  C.1.1 Components
  C.1.2 Model, view, controller
  C.1.3 Interfaces
 C.2 Exception Handling
D JAPE: Implementation
 D.1 Formal Description of the JAPE Grammar
 D.2 Relation to CPSL
 D.3 Initialisation of a JAPE Grammar
 D.4 Execution of JAPE Grammars
 D.5 Using a Different Java Compiler
E Ant Tasks for GATE
 E.1 Declaring the Tasks
 E.2 The packagegapp task - bundling an application with its dependencies
  E.2.1 Introduction
  E.2.2 Basic Usage
  E.2.3 Handling Non-Plugin Resources
  E.2.4 Streamlining your Plugins
  E.2.5 Bundling Extra Resources
 E.3 The expandcreoles Task - Merging Annotation-Driven Config into creole.xml
F Named-Entity State Machine Patterns
 F.1 Main.jape
 F.2 first.jape
 F.3 firstname.jape
 F.4 name.jape
  F.4.1 Person
  F.4.2 Location
  F.4.3 Organization
  F.4.4 Ambiguities
  F.4.5 Contextual information
 F.5 name_post.jape
 F.6 date_pre.jape
 F.7 date.jape
 F.8 reldate.jape
 F.9 number.jape
 F.10 address.jape
 F.11 url.jape
 F.12 identifier.jape
 F.13 jobtitle.jape
 F.14 final.jape
 F.15 unknown.jape
 F.16 name_context.jape
 F.17 org_context.jape
 F.18 loc_context.jape
 F.19 clean.jape
G Part-of-Speech Tags used in the Hepple Tagger