Log in Help
Print
Homesaletao 〉 split.html
 

This documentation is for the latest snapshot of GATE Developer/Embedded.
If you are using the latest release version 7.1 you should refer to the 7.1 User Guide instead.


Developing Language Processing
Components with GATE
Version 8 (a User Guide)
  For GATE version 8.0-snapshot (development builds)
  (built April 24, 2014)

  Hamish Cunningham, Diana Maynard, Kalina Bontcheva, Valentin Tablan, Niraj Aswani, Ian Roberts, Genevieve Gorrell, Adam Funk, Angus Roberts, Danica Damljanovic, Thomas Heitz, Mark A. Greenwood, Horacio Saggion, Johann Petrak, Yaoyong Li, Wim Peters, et al
  
  The University of Sheffield, Department of Computer Science 2001-2014

  http://gate.ac.uk/

How to cite this guide

PDF version

Single HTML page

Multiple HTML pages


Work on GATE has been partly supported by EPSRC grants GR/K25267 (Large-Scale Information Extraction), GR/M31699 (GATE 2), RA007940 (EMILLE), GR/N15764/01 (AKT) and GR/R85150/01 (MIAKT), AHRB grant APN16396 (ETCSL/GATE), Ontotext Matrixware, the Information Retrieval Facility and several EU-funded projects: (TrendMiner, uComp, Arcomem, SEKT, TAO, NeOn, MediaCampaign, Musing, KnowledgeWeb, PrestoSpace, h-TechSight, and enIRaF).

Contents
I  GATE Basics
1 Introduction
 1.1 How to Use this Text
 1.2 Context
 1.3 Overview
 1.4 Some Evaluations
 1.5 Recent Changes
 1.6 Further Reading
2 Installing and Running GATE
 2.1 Downloading GATE
 2.2 Installing and Running GATE
 2.3 Using System Properties with GATE
 2.4 Changing GATE’s launch configuration
 2.5 Configuring GATE
 2.6 Building GATE
 2.7 Uninstalling GATE
 2.8 Troubleshooting
3 Using GATE Developer
 3.1 The GATE Developer Main Window
 3.2 Loading and Viewing Documents
 3.3 Creating and Viewing Corpora
 3.4 Working with Annotations
 3.5 Using CREOLE Plugins
 3.6 Installing and updating CREOLE Plugins
 3.7 Loading and Using Processing Resources
 3.8 Creating and Running an Application
 3.9 Saving Applications and Language Resources
 3.10 Keyboard Shortcuts
 3.11 Miscellaneous
4 CREOLE: the GATE Component Model
 4.1 The Web and CREOLE
 4.2 The GATE Framework
 4.3 The Lifecycle of a CREOLE Resource
 4.4 Processing Resources and Applications
 4.5 Language Resources and Datastores
 4.6 Built-in CREOLE Resources
 4.7 CREOLE Resource Configuration
 4.8 Tools: How to Add Utilities to GATE Developer
5 Language Resources: Corpora, Documents and Annotations
 5.1 Features: Simple Attribute/Value Data
 5.2 Corpora: Sets of Documents plus Features
 5.3 Documents: Content plus Annotations plus Features
 5.4 Annotations: Directed Acyclic Graphs
 5.5 Document Formats
 5.6 XML Input/Output
6 ANNIE: a Nearly-New Information Extraction System
 6.1 Document Reset
 6.2 Tokeniser
 6.3 Gazetteer
 6.4 Sentence Splitter
 6.5 RegEx Sentence Splitter
 6.6 Part of Speech Tagger
 6.7 Semantic Tagger
 6.8 Orthographic Coreference (OrthoMatcher)
 6.9 Pronominal Coreference
 6.10 A Walk-Through Example
II  GATE for Advanced Users
7 GATE Embedded
 7.1 Quick Start with GATE Embedded
 7.2 Resource Management in GATE Embedded
 7.3 Using CREOLE Plugins
 7.4 Language Resources
 7.5 Processing Resources
 7.6 Controllers
 7.7 Modelling Relations between Annotations
 7.8 Duplicating a Resource
 7.9 Persistent Applications
 7.10 Ontologies
 7.11 Creating a New Annotation Schema
 7.12 Creating a New CREOLE Resource
 7.13 Adding Support for a New Document Format
 7.14 Using GATE Embedded in a Multithreaded Environment
 7.15 Using GATE Embedded within a Spring Application
 7.16 Using GATE Embedded within a Tomcat Web Application
 7.17 Groovy for GATE
 7.18 Saving Config Data to gate.xml
 7.19 Annotation merging through the API
 7.20 Using Resource Helpers to Extend the API
8 JAPE: Regular Expressions over Annotations
 8.1 The Left-Hand Side
 8.2 LHS Operators in Detail
 8.3 The Right-Hand Side
 8.4 Use of Priority
 8.5 Using Phases Sequentially
 8.6 Using Java Code on the RHS
 8.7 Optimising for Speed
 8.8 Ontology Aware Grammar Transduction
 8.9 Serializing JAPE Transducer
 8.10 Notes for Montreal Transducer Users
 8.11 JAPE Plus
9 ANNIC: ANNotations-In-Context
 9.1 Instantiating SSD
 9.2 Search GUI
 9.3 Using SSD from GATE Embedded
10 Performance Evaluation of Language Analysers
 10.1 Metrics for Evaluation in Information Extraction
 10.2 The Annotation Diff Tool
 10.3 Corpus Quality Assurance
 10.4 Corpus Benchmark Tool
 10.5 A Plugin Computing Inter-Annotator Agreement (IAA)
 10.6 A Plugin Computing the BDM Scores for an Ontology
 10.7 Quality Assurance Summariser for Teamware
11 Profiling Processing Resources
 11.1 Overview
 11.2 Graphical User Interface
 11.3 Command Line Interface
 11.4 Application Programming Interface
12 Developing GATE
 12.1 Reporting Bugs and Requesting Features
 12.2 Contributing Patches
 12.3 Creating New Plugins
 12.4 Updating this User Guide
III  CREOLE Plugins
13 Gazetteers
 13.1 Introduction to Gazetteers
 13.2 ANNIE Gazetteer
 13.3 OntoGazetteer
 13.4 Gaze Ontology Gazetteer Editor
 13.5 Hash Gazetteer
 13.6 Flexible Gazetteer
 13.7 Gazetteer List Collector
 13.8 OntoRoot Gazetteer
 13.9 Large KB Gazetteer
 13.10 The Shared Gazetteer for multithreaded processing
14 Working with Ontologies
 14.1 Data Model for Ontologies
 14.2 Ontology Event Model
 14.3 The Ontology Plugin: Current Implementation
 14.4 The Ontology_OWLIM2 plugin: backwards-compatible implementation
 14.5 GATE Ontology Editor
 14.6 Ontology Annotation Tool
 14.7 Relation Annotation Tool
 14.8 Using the ontology API
 14.9 Using the ontology API (old version)
 14.10 Ontology-Aware JAPE Transducer
 14.11 Annotating Text with Ontological Information
 14.12 Populating Ontologies
 14.13 Ontology API and Implementation Changes
15 Non-English Language Support
 15.1 Language Identification
 15.2 French Plugin
 15.3 German Plugin
 15.4 Romanian Plugin
 15.5 Arabic Plugin
 15.6 Chinese Plugin
 15.7 Hindi Plugin
 15.8 Russian Plugin
 15.9 Bulgarian Plugin
16 Domain Specific Resources
 16.1 Biomedical Support
17 Parsers
 17.1 MiniPar Parser
 17.2 RASP Parser
 17.3 SUPPLE Parser
 17.4 Stanford Parser
18 Machine Learning
 18.1 ML Generalities
 18.2 Batch Learning PR
 18.3 Machine Learning PR
19 Tools for Alignment Tasks
 19.1 Introduction
 19.2 The Tools
20 Crowdsourcing Data with GATE
 20.1 The Basics
 20.2 Entity classification
 20.3 Entity annotation
21 Combining GATE and UIMA
 21.1 Embedding a UIMA AE in GATE
 21.2 Embedding a GATE CorpusController in UIMA
22 More (CREOLE) Plugins
 22.1 Verb Group Chunker
 22.2 Noun Phrase Chunker
 22.3 TaggerFramework
 22.4 Chemistry Tagger
 22.5 Zemanta Semantic Annotation Service
 22.6 Lupedia Semantic Annotation Service
 22.7 TextRazor Annotation Service
 22.8 Annotating Numbers
 22.9 Annotating Measurements
 22.10 Annotating and Normalizing Dates
 22.11 Snowball Based Stemmers
 22.12 GATE Morphological Analyzer
 22.13 Flexible Exporter
 22.14 Configurable Exporter
 22.15 Annotation Set Transfer
 22.16 Schema Enforcer
 22.17 Information Retrieval in GATE
 22.18 Websphinx Web Crawler
 22.19 WordNet in GATE
 22.20 Kea - Automatic Keyphrase Detection
 22.21 Annotation Merging Plugin
 22.22 Copying Annotations between Documents
 22.23 OpenCalais Plugin
 22.24 LingPipe Plugin
 22.25 OpenNLP Plugin
 22.26 Stanford Part-of-Speech Tagger
 22.27 Content Detection Using Boilerpipe
 22.28 Inter Annotator Agreement
 22.29 Schema Annotation Editor
 22.30 Coref Tools Plugin
 22.31 Pubmed Format
 22.32 MediaWiki Format
 22.33 Fast Infoset Document Format
 22.34 CSV Document Support
 22.35 Twitter processing
 22.36 TermRaider term extraction tools
 22.37 Document Normalizer
 22.38 Developer Tools
IV  The GATE Family: Cloud, MIMIR, Teamware
23 GATE Cloud
 23.1 GATE Cloud services: an overview
 23.2 Comparison with other systems
 23.3 How to buy services
 23.4 Pricing and discounts
 23.5 Annotation Jobs on GATECloud.net
 23.6 Running Custom Annotation Jobs on GATECloud.net
24 GATE Teamware: A Web-based Collaborative Corpus Annotation Tool
 24.1 Introduction
 24.2 Requirements for Multi-Role Collaborative Annotation Environments
 24.3 Teamware: Architecture, Implementation, and Examples
 24.4 Practical Applications
25 GATE Mímir
Appendices
A Change Log
 A.1 Next Release
 A.2 Version 7.1 (November 2012)
 A.3 Version 7.0 (February 2012)
 A.4 Version 6.1 (April 2011)
 A.5 Version 6.0 (November 2010)
 A.6 Version 5.2.1 (May 2010)
 A.7 Version 5.2 (April 2010)
 A.8 Version 5.1 (December 2009)
 A.9 Version 5.0 (May 2009)
 A.10 Version 4.0 (July 2007)
 A.11 Version 3.1 (April 2006)
 A.12 January 2005
 A.13 December 2004
 A.14 September 2004
 A.15 Version 3 Beta 1 (August 2004)
 A.16 July 2004
 A.17 June 2004
 A.18 April 2004
 A.19 March 2004
 A.20 Version 2.2 – August 2003
 A.21 Version 2.1 – February 2003
 A.22 June 2002
B Version 5.1 Plugins Name Map
C Obsolete CREOLE Plugins
 C.1 Ontotext JapeC Compiler
 C.2 Google Plugin
 C.3 Yahoo Plugin
 C.4 Gazetteer Visual Resource - GAZE
 C.5 Google Translator PR
D Design Notes
 D.1 Patterns
 D.2 Exception Handling
E Ant Tasks for GATE
 E.1 Declaring the Tasks
 E.2 The packagegapp task - bundling an application with its dependencies
 E.3 The expandcreoles Task - Merging Annotation-Driven Config into creole.xml
F Named-Entity State Machine Patterns
 F.1 Main.jape
 F.2 first.jape
 F.3 firstname.jape
 F.4 name.jape
 F.5 name_post.jape
 F.6 date_pre.jape
 F.7 date.jape
 F.8 reldate.jape
 F.9 number.jape
 F.10 address.jape
 F.11 url.jape
 F.12 identifier.jape
 F.13 jobtitle.jape
 F.14 final.jape
 F.15 unknown.jape
 F.16 name_context.jape
 F.17 org_context.jape
 F.18 loc_context.jape
 F.19 clean.jape
G Part-of-Speech Tags used in the Hepple Tagger
H Copyright and Licence
References
Colophon