Log in Help
Print
Homereleasesgate-5.0-beta1-build3048-ALLdoctao 〉 splitli1.html
 

Contents

1 Introduction
 1.1 How to Use This Text
 1.2 Context
 1.3 Overview
  1.3.1 Developing and Deploying Language Processing Facilities
  1.3.2 Built-in Components
  1.3.3 Additional Facilities
  1.3.4 An Example
 1.4 Structure of the Book
 1.5 Further Reading
2 Change Log
 2.1 Version 5.0-beta1 (October 2008)
  2.1.1 Major new features
  2.1.2 Other new features and improvements
  2.1.3 Specific bug fixes
 2.2 Version 4.0 (July 2007)
  2.2.1 Major new features
  2.2.2 Other new features and improvements
  2.2.3 Bug fixes and optimizations
 2.3 Version 3.1 (April 2006)
  2.3.1 Major new features
  2.3.2 Other new features and improvements
  2.3.3 Bug fixes
 2.4 January 2005
 2.5 December 2004
 2.6 September 2004
 2.7 Version 3 Beta 1 (August 2004)
 2.8 July 2004
 2.9 June 2004
 2.10 April 2004
 2.11 March 2004
 2.12 Version 2.2 – August 2003
 2.13 Version 2.1 – February 2003
 2.14 June 2002
3 How To…
 3.1 Download GATE
 3.2 Install and Run GATE
  3.2.1 The Easy Way
  3.2.2 The Hard Way (1)
  3.2.3 The Hard Way (2): Subversion
 3.3 [D,F] Use System Properties with GATE
 3.4 [D,F] Use (CREOLE) Plug-ins
 3.5 Troubleshooting
 3.6 [D] Get Started with the GUI
 3.7 [D,F] Configure GATE
  3.7.1 [F] Save Config Data to gate.xml
 3.8 Build GATE
 3.9 [D] Use GATE with Maven or JPF
 3.10 [D,F] Create a New CREOLE Resource
 3.11 [F] Instantiate CREOLE Resources
 3.12 [D] Load CREOLE Resources
  3.12.1 Loading Language Resources
  3.12.2 Loading Processing Resources
  3.12.3 Loading and Processing Large Corpora
 3.13 [D,F] Configure CREOLE Resources
 3.14 [D] Create and Run an Application
 3.15 [D] Run PRs Conditionally on Document Features
 3.16 [D] View Annotations
 3.17 [D] Do Information Extraction with ANNIE
 3.18 [D] Modify ANNIE
 3.19 [D] Create and Edit Test Data
  3.19.1 Schema-driven editing
  3.19.2 Saving the test data
 3.20 [D,F] Create a New Annotation Schema
 3.21 [D] Save and Restore LRs in Data Stores
 3.22 [D] Save Resource Parameter State to File
 3.23 [D,F] Perform Evaluation with the AnnotationDiff tool
  3.23.1 GUI
 3.24 [D] Use the Corpus Benchmark Evaluation tool
  3.24.1 GUI mode
  3.24.2 How to define the properties of the benchmark tool
 3.25 [D] Write JAPE Grammars
 3.26 [F] Embed NLE in other Applications
 3.27 [F] Use GATE within a Spring application
 3.28 [F] Use GATE within a Tomcat Web Application
  3.28.1 Recommended Directory Structure
  3.28.2 Configuration files
  3.28.3 Initialization code
 3.29 [F] Use GATE in a Multithreaded Environment
 3.30 [D,F] Add support for a new document format
 3.31 [D] Dump Results to File
 3.32 [D] Stop GUI ‘Freezing’ on Linux
 3.33 [D] Stop GUI Crashing on Linux
 3.34 [D] Stop GATE Restoring GUI Sessions/Options
 3.35 Work with Unicode
 3.36 Work with Oracle and PostgreSQL
4 CREOLE: the GATE Component Model
 4.1 The Web and CREOLE
 4.2 Java Beans: a Simple Component Architecture
 4.3 The GATE Framework
 4.4 Language Resources and Processing Resources
 4.5 The Lifecycle of a CREOLE Resource
 4.6 Processing Resources and Applications
 4.7 Language Resources and Datastores
 4.8 Built-in CREOLE Resources
 4.9 CREOLE Resource Configuration
  4.9.1 Configuration with XML
  4.9.2 Configuring resources using annotations
  4.9.3 Mixing the configuration styles
5 Visual CREOLE
 5.1 Gazetteer Visual Resource - GAZE
  5.1.1 Running Modes
  5.1.2 Loading a Gazetteer
  5.1.3 Linear Definition Pane
  5.1.4 Linear Definition Toolbar
  5.1.5 Operations on Linear Definition Nodes
  5.1.6 Gazetteer List Pane
  5.1.7 Mapping Definition Pane
 5.2 Ontogazetteer
  5.2.1 Gazetteer Lists Editor and Mapper
  5.2.2 Ontogazetteer Editor
 5.3 The Co-reference Editor
6 Language Resources: Corpora, Documents and Annotations
 6.1 Features: Simple Attribute/Value Data
 6.2 Corpora: Sets of Documents plus Features
 6.3 Documents: Content plus Annotations plus Features
 6.4 Annotations: Directed Acyclic Graphs
  6.4.1 Annotation Schemas
  6.4.2 Examples of Annotated Documents
  6.4.3 Creating, Viewing and Editing Diverse Annotation Types
 6.5 Document Formats
  6.5.1 Detecting the right reader
  6.5.2 XML
  6.5.3 HTML
  6.5.4 SGML
  6.5.5 Plain text
  6.5.6 RTF
  6.5.7 Email
 6.6 XML Input/Output
7 JAPE: Regular Expressions Over Annotations
 7.1 Matching operators in detail
  7.1.1 Equality operators (“==” and “!=”)
  7.1.2 Comparison operators (“<”, “<=”, “>=” and “>”)
  7.1.3 Regular expression operators (“=~”, “==~”, “!~” and “!=~”)
  7.1.4 Contextual operators (“contains” and “within”)
 7.2 Use of Context
 7.3 Use of Priority
 7.4 Use of negation
 7.5 Useful tricks
 7.6 Ontology aware grammar transduction
 7.7 Using Java code in JAPE rules
  7.7.1 Adding a feature to the document
  7.7.2 Using named blocks
  7.7.3 Java RHS overview
 7.8 Optimising for speed
 7.9 Serializing JAPE Transducer
  7.9.1 How to serialize?
  7.9.2 How to use the serialized grammar file?
 7.10 The JAPE Debugger
  7.10.1 Debugger GUI
  7.10.2 Using the Debugger
  7.10.3 Known Bugs
 7.11 Notes for Montreal Transducer users
8 ANNIE: a Nearly-New Information Extraction System
 8.1 Tokeniser
  8.1.1 Tokeniser Rules
  8.1.2 Token Types
  8.1.3 English Tokeniser
 8.2 Gazetteer
 8.3 Sentence Splitter
 8.4 RegEx Sentence Splitter
 8.5 Part of Speech Tagger
 8.6 Semantic Tagger
 8.7 Orthographic Coreference (OrthoMatcher)
  8.7.1 GATE Interface
  8.7.2 Resources
  8.7.3 Processing
 8.8 Pronominal Coreference
  8.8.1 Quoted Speech Submodule
  8.8.2 Pleonastic It submodule
  8.8.3 Pronominal Resolution Submodule
  8.8.4 Detailed description of the algorithm
 8.9 A Walk-Through Example
  8.9.1 Step 1 - Tokenisation
  8.9.2 Step 2 - List Lookup
  8.9.3 Step 3 - Grammar Rules
9 (More CREOLE) Plugins
 9.1 Document Reset
 9.2 Verb Group Chunker
 9.3 Noun Phrase Chunker
  9.3.1 Differences from the Original
  9.3.2 Using the Chunker
 9.4 OntoText Gazetteer
  9.4.1 Prerequisites
  9.4.2 Setup
 9.5 Flexible Gazetteer
 9.6 Gazetteer List Collector
 9.7 Tree Tagger
  9.7.1 POS tags
 9.8 Stemmer
  9.8.1 Algorithms
 9.9 GATE Morphological Analyzer
  9.9.1 Rule File
 9.10 MiniPar Parser
  9.10.1 Platform Supported
  9.10.2 Resources
  9.10.3 Parameters
  9.10.4 Prerequisites
  9.10.5 Grammatical Relationships
 9.11 RASP Parser
 9.12 SUPPLE Parser (formerly BuChart)
  9.12.1 Requirements
  9.12.2 Building SUPPLE
  9.12.3 Running the parser in GATE
  9.12.4 Viewing the parse tree
  9.12.5 System properties
  9.12.6 Configuration files
  9.12.7 Parser and Grammar
  9.12.8 Mapping Named Entities
  9.12.9 Upgrading from BuChart to SUPPLE
 9.13 Stanford Parser
  9.13.1 Input requirements
  9.13.2 Initialization parameters
  9.13.3 Runtime parameters
 9.14 Montreal Transducer
  9.14.1 Main Improvements
  9.14.2 Main Bug fixes
 9.15 Language Plugins
  9.15.1 French Plugin
  9.15.2 German Plugin
  9.15.3 Romanian Plugin
  9.15.4 Arabic Plugin
  9.15.5 Chinese Plugin
 9.16 Chemistry Tagger
  9.16.1 Using the tagger
 9.17 Flexible Exporter
 9.18 Annotation Set Transfer
 9.19 Information Retrieval in GATE
  9.19.1 Using the IR functionality in GATE
  9.19.2 Using the IR API
 9.20 Crawler
  9.20.1 Using the Crawler PR
 9.21 Google Plugin
  9.21.1 Using the GooglePR
 9.22 Yahoo Plugin
  9.22.1 Using the YahooPR
 9.23 WordNet in GATE
  9.23.1 The WordNet API
 9.24 Machine Learning in GATE
  9.24.1 ML Generalities
  9.24.2 The Machine Learning PR in GATE
  9.24.3 The WEKA Wrapper
  9.24.4 Training an ML model with the ML PR and WEKA wrapper
  9.24.5 Applying a learnt model
  9.24.6 The MAXENT Wrapper
  9.24.7 The SVM Light Wrapper
 9.25 MinorThird
 9.26 MIAKT NLG Lexicon
  9.26.1 Complexity and Generality
 9.27 Kea - Automatic Keyphrase Detection
  9.27.1 Using the “KEA Keyphrase Extractor” PR
  9.27.2 Using Kea corpora
 9.28 Ontotext JapeC Compiler
 9.29 ANNIC
  9.29.1 Instantiating SSD
  9.29.2 Search GUI
  9.29.3 Using SSD from your code
 9.30 Annotation Merging
  9.30.1 Two implemented methods
  9.30.2 Annotation Merging Plugin
 9.31 OntoRoot Gazetteer
  9.31.1 How does it work?
  9.31.2 Initialisation of OntoRoot Gazetteer
10 Working with Ontologies
 10.1 Data Model for Ontologies
  10.1.1 Hierarchies of classes and restrictions
  10.1.2 Instances
  10.1.3 Hierarchies of properties
 10.2 Ontology Event Model (new in Gate 4)
  10.2.1 What happens when a resource is deleted?
 10.3 OWLIM Ontology LR
 10.4 GATE’s Ontology Editor
 10.5 Instantiating OWLIM Ontology using GATE API
 10.6 Ontology-Aware JAPE Transducer
 10.7 Annotating text with Ontological Information
 10.8 Populating Ontologies
 10.9 Ontology Annotation Tool
  10.9.1 Viewing Annotated Texts
  10.9.2 Editing Existing Annotations
  10.9.3 Adding New Annotations
  10.9.4 Options
11 Machine Learning API
 .1ML Generalities
  .1.1Some definitions
  .1.2GATE-specific interpretation of the above definitions
 .2The Batch Learning PR in GATE
  .2.1The settings not specified in the configuration file
  .2.2All the settings in the XML configuration file
 .3Examples of configuration file for the three learning types
 .4How to use the ML API
 .5The outputs of the ML API
  .5.1Training results
  .5.2Application results
  .5.3Evaluation results
  .5.4Feature files
12 Tools for Alignment Tasks
 12.1 Introduction
 12.2 Tools for Alignment Tasks
  12.2.1 Compound Document
  12.2.2 Compound Document Editor
  12.2.3 Composite Document
  12.2.4 DeleteMembersPR
  12.2.5 SwitchMembersPR
  12.2.6 Saving as XML
  12.2.7 Alignment Editor
13 Performance Evaluation of Language Analysers
 13.1 The AnnotationDiff Tool
 13.2 The six annotation relations explained
 13.3 Benchmarking tool
 13.4 Metrics for Evaluation in Information Extraction
 13.5 Metrics for Evaluation of Inter-Annotator Agreement
 13.6 A Plugin for Computing Inter-Annotator Agreement
  13.6.1 IAA for Classification Task
  13.6.2 IAA For Named Entity Annotation
14 Users, Groups, and LR Access Rights
 14.1 Java serialisation and LR access rights
 14.2 Oracle Datastore and LR access rights
  14.2.1 Users, Groups, Sessions and Access Modes
  14.2.2 User/Group Administration
  14.2.3 The API
15 Developing GATE
 15.1 Creating new plugins
  15.1.1 Where to keep plugins in the GATE hierarchy
  15.1.2 Writing a new PR
  15.1.3 Writing a new VR
  15.1.4 Adding plugins to the nightly build
 15.2 Updating this User Guide
  15.2.1 Building the User Guide
  15.2.2 Making changes to the User Guide
16 Combining GATE and UIMA
 16.1 Embedding a UIMA TAE in GATE
  16.1.1 Mapping File Format
  16.1.2 The UIMA component descriptor
  16.1.3 Using the AnalysisEnginePR
  16.1.4 Current limitations
 16.2 Embedding a GATE CorpusController in UIMA
  16.2.1 Mapping file format
  16.2.2 The GATE application definition
  16.2.3 Configuring the GATEApplicationAnnotator
Appendices
A Design Notes
 A.1 Patterns
  A.1.1 Components
  A.1.2 Model, view, controller
  A.1.3 Interfaces
 A.2 Exception Handling
B JAPE: Implementation
 B.1 Formal Description of the JAPE Grammar
 B.2 Relation to CPSL
 B.3 Algorithms for JAPE Rule Application
  B.3.1 The first algorithm
  B.3.2 Algorithm 2
 B.4 Label Binding Scheme
 B.5 Classes
 B.6 Implementation
  B.6.1 A Walk-Through
  B.6.2 Example RHS code
 B.7 Compilation
 B.8 Using a Different Java Compiler
C Named-Entity State Machine Patterns
 C.1 Main.jape
 C.2 first.jape
 C.3 firstname.jape
 C.4 name.jape
  C.4.1 Person
  C.4.2 Location
  C.4.3 Organization
  C.4.4 Ambiguities
  C.4.5 Contextual information
 C.5 name_post.jape
 C.6 date_pre.jape
 C.7 date.jape
 C.8 reldate.jape
 C.9 number.jape
 C.10 address.jape
 C.11 url.jape
 C.12 identifier.jape
 C.13 jobtitle.jape
 C.14 final.jape
 C.15 unknown.jape
 C.16 name_context.jape
 C.17 org_context.jape
 C.18 loc_context.jape
 C.19 clean.jape
D Part-of-Speech Tags used in the Hepple Tagger
E Sample ML Configuration File
F IAA Measures for Classification Tasks
References