Log in Help
Print
Homereleasesgate-8.6.1tao 〉 splitli1.html
 

Contents

I  GATE Basics
1 Introduction
 1.1 How to Use this Text
 1.2 Context
 1.3 Overview
  1.3.1 Developing and Deploying Language Processing Facilities
  1.3.2 Built-In Components
  1.3.3 Additional Facilities in GATE Developer/Embedded
  1.3.4 An Example
 1.4 Some Evaluations
 1.5 Recent Changes
  1.5.1 Version 8.6.1 (January 2020)
  1.5.2 Version 8.6 (June 2019)
  1.5.3 Version 8.5.1 (June 2018)
  1.5.4 Version 8.5 (May 2018)
 1.6 Further Reading
2 Installing and Running GATE
 2.1 Downloading GATE
 2.2 Installing and Running GATE
  2.2.1 The Easy Way
  2.2.2 The Hard Way (1)
  2.2.3 The Hard Way (2): Git
  2.2.4 Running GATE Developer on Unix/Linux
 2.3 Using System Properties with GATE
 2.4 Changing GATE’s launch configuration
 2.5 Configuring GATE
 2.6 Building GATE
  2.6.1 Using GATE with Maven/Ivy
 2.7 Uninstalling GATE
 2.8 Troubleshooting
3 Using GATE Developer
 3.1 The GATE Developer Main Window
 3.2 Loading and Viewing Documents
 3.3 Creating and Viewing Corpora
 3.4 Working with Annotations
  3.4.1 The Annotation Sets View
  3.4.2 The Annotations List View
  3.4.3 The Annotations Stack View
  3.4.4 The Co-reference Editor
  3.4.5 Creating and Editing Annotations
  3.4.6 Schema-Driven Editing
  3.4.7 Printing Text with Annotations
 3.5 Using CREOLE Plugins
 3.6 Installing and updating CREOLE Plugins
 3.7 Loading and Using Processing Resources
 3.8 Creating and Running an Application
  3.8.1 Running an Application on a Datastore
  3.8.2 Running PRs Conditionally on Document Features
  3.8.3 Doing Information Extraction with ANNIE
  3.8.4 Modifying ANNIE
 3.9 Saving Applications and Language Resources
  3.9.1 Saving Documents to File
  3.9.2 Saving and Restoring LRs in Datastores
  3.9.3 Saving Application States to a File
  3.9.4 Saving an Application with its Resources (e.g. GATE Cloud)
  3.9.5 Upgrade An Application to use Newer Versions of Plugins
 3.10 Keyboard Shortcuts
 3.11 Miscellaneous
  3.11.1 Stopping GATE from Restoring Developer Sessions/Options
  3.11.2 Working with Unicode
4 CREOLE: the GATE Component Model
 4.1 The Web and CREOLE
 4.2 The GATE Framework
 4.3 The Lifecycle of a CREOLE Resource
 4.4 Processing Resources and Applications
 4.5 Language Resources and Datastores
 4.6 Built-in CREOLE Resources
 4.7 CREOLE Resource Configuration
  4.7.1 Configuring Resources using Annotations
  4.7.2 Loading Third-Party Libraries in a Maven plugin
 4.8 Tools: How to Add Utilities to GATE Developer
  4.8.1 Putting Your Tools in a Sub-Menu
  4.8.2 Adding Tools To Existing Resource Types
5 Language Resources: Corpora, Documents and Annotations
 5.1 Features: Simple Attribute/Value Data
 5.2 Corpora: Sets of Documents plus Features
 5.3 Documents: Content plus Annotations plus Features
 5.4 Annotations: Directed Acyclic Graphs
  5.4.1 Annotation Schemas
  5.4.2 Examples of Annotated Documents
  5.4.3 Creating, Viewing and Editing Diverse Annotation Types
 5.5 Document Formats
  5.5.1 Detecting the Right Reader
  5.5.2 XML
  5.5.3 HTML
  5.5.4 SGML
  5.5.5 Plain text
  5.5.6 RTF
  5.5.7 Email
  5.5.8 PDF Files and Office Documents
  5.5.9 UIMA CAS Documents
  5.5.10 CoNLL/IOB Documents
 5.6 XML Input/Output
6 ANNIE: a Nearly-New Information Extraction System
 6.1 Document Reset
 6.2 Tokeniser
  6.2.1 Tokeniser Rules
  6.2.2 Token Types
  6.2.3 English Tokeniser
 6.3 Gazetteer
 6.4 Sentence Splitter
 6.5 RegEx Sentence Splitter
 6.6 Part of Speech Tagger
 6.7 Semantic Tagger
 6.8 Orthographic Coreference (OrthoMatcher)
  6.8.1 GATE Interface
  6.8.2 Resources
  6.8.3 Processing
 6.9 Pronominal Coreference
  6.9.1 Quoted Speech Submodule
  6.9.2 Pleonastic It Submodule
  6.9.3 Pronominal Resolution Submodule
  6.9.4 Detailed Description of the Algorithm
 6.10 A Walk-Through Example
  6.10.1 Step 1 - Tokenisation
  6.10.2 Step 2 - List Lookup
  6.10.3 Step 3 - Grammar Rules
II  GATE for Advanced Users
7 GATE Embedded
 7.1 Quick Start with GATE Embedded
 7.2 Resource Management in GATE Embedded
 7.3 Using CREOLE Plugins
 7.4 Language Resources
  7.4.1 GATE Documents
  7.4.2 Feature Maps
  7.4.3 Annotation Sets
  7.4.4 Annotations
  7.4.5 GATE Corpora
 7.5 Processing Resources
 7.6 Controllers
 7.7 Modelling Relations between Annotations
 7.8 Duplicating a Resource
  7.8.1 Sharable properties
 7.9 Persistent Applications
 7.10 Ontologies
 7.11 Loading Annotation Schemas
 7.12 Creating a New CREOLE Resource
  7.12.1 Dependencies
 7.13 Adding Support for a New Document Format
 7.14 Using GATE Embedded in a Multithreaded Environment
 7.15 Using GATE Embedded within a Spring Application
  7.15.1 Duplication in Spring
  7.15.2 Spring pooling
  7.15.3 Further reading
 7.16 Groovy for GATE
  7.16.1 Groovy Scripting Console for GATE
  7.16.2 Groovy scripting PR
  7.16.3 The Scriptable Controller
  7.16.4 Utility methods
 7.17 Saving Config Data to gate.xml
 7.18 Annotation merging through the API
 7.19 Using Resource Helpers to Extend the API
 7.20 Converting a Directory Plugin to a Maven Plugin
8 JAPE: Regular Expressions over Annotations
 8.1 The Left-Hand Side
  8.1.1 Matching Entire Annotation Types
  8.1.2 Using Features and Values
  8.1.3 Using Meta-Properties
  8.1.4 Building complex patterns from simple patterns
  8.1.5 Matching a Simple Text String
  8.1.6 Using Templates
  8.1.7 Multiple Pattern/Action Pairs
  8.1.8 LHS Macros
  8.1.9 Multi-Constraint Statements
  8.1.10 Using Context
  8.1.11 Negation
  8.1.12 Escaping Special Characters
 8.2 LHS Operators in Detail
  8.2.1 Equality Operators
  8.2.2 Comparison Operators
  8.2.3 Regular Expression Operators
  8.2.4 Contextual Operators
  8.2.5 Custom Operators
 8.3 The Right-Hand Side
  8.3.1 A Simple Example
  8.3.2 Copying Feature Values from the LHS to the RHS
  8.3.3 Optional or Empty Labels
  8.3.4 RHS Macros
 8.4 Use of Priority
 8.5 Using Phases Sequentially
 8.6 Using Java Code on the RHS
  8.6.1 A More Complex Example
  8.6.2 Adding a Feature to the Document
  8.6.3 Finding the Tokens of a Matched Annotation
  8.6.4 Using Named Blocks
  8.6.5 Java RHS Overview
 8.7 Optimising for Speed
 8.8 Ontology Aware Grammar Transduction
 8.9 Serializing JAPE Transducer
  8.9.1 How to Serialize?
  8.9.2 How to Use the Serialized Grammar File?
 8.10 Notes for Montreal Transducer Users
 8.11 JAPE Plus
9 ANNIC: ANNotations-In-Context
 9.1 Instantiating SSD
 9.2 Search GUI
  9.2.1 Overview
  9.2.2 Syntax of Queries
  9.2.3 Top Section
  9.2.4 Central Section
  9.2.5 Bottom Section
 9.3 Using SSD from GATE Embedded
  9.3.1 How to instantiate a searchabledatastore
  9.3.2 How to search in this datastore
10 Performance Evaluation of Language Analysers
 10.1 Metrics for Evaluation in Information Extraction
  10.1.1 Annotation Relations
  10.1.2 Cohen’s Kappa
  10.1.3 Precision, Recall, F-Measure
  10.1.4 Macro and Micro Averaging
 10.2 The Annotation Diff Tool
  10.2.1 Performing Evaluation with the Annotation Diff Tool
  10.2.2 Creating a Gold Standard with the Annotation Diff Tool
  10.2.3 A warning about feature values
 10.3 Corpus Quality Assurance
  10.3.1 Description of the interface
  10.3.2 Step by step usage
  10.3.3 Details of the Corpus statistics table
  10.3.4 Details of the Document statistics table
  10.3.5 GATE Embedded API for the measures
  10.3.6 A warning about feature values
  10.3.7 Quality Assurance PR
 10.4 Corpus Benchmark Tool
  10.4.1 Preparing the Corpora for Use
  10.4.2 Defining Properties
  10.4.3 Running the Tool
  10.4.4 The Results
 10.5 A Plugin Computing Inter-Annotator Agreement (IAA)
  10.5.1 IAA for Classification
  10.5.2 IAA For Named Entity Annotation
  10.5.3 The BDM-Based IAA Scores
 10.6 A Plugin Computing the BDM Scores for an Ontology
  10.6.1 Computing BDM from embedded code
 10.7 Quality Assurance Summariser for Teamware
11 Profiling Processing Resources
 11.1 Overview
  11.1.1 Features
  11.1.2 Limitations
 11.2 Graphical User Interface
 11.3 Command Line Interface
 11.4 Application Programming Interface
  11.4.1 Log4j.properties
  11.4.2 Benchmark log format
  11.4.3 Enabling profiling
  11.4.4 Reporting tool
12 Developing GATE
 12.1 Reporting Bugs and Requesting Features
 12.2 Contributing Patches
 12.3 Creating New Plugins
  12.3.1 What to Call your Plugin
  12.3.2 Writing a New PR
  12.3.3 Writing a New VR
  12.3.4 Writing a ‘Ready Made’ Application
  12.3.5 Distributing Your New Plugins
 12.4 Adding your plugin to the default list
 12.5 Updating this User Guide
  12.5.1 Building the User Guide
  12.5.2 Making Changes to the User Guide
III  CREOLE Plugins
13 Gazetteers
 13.1 Introduction to Gazetteers
 13.2 ANNIE Gazetteer
  13.2.1 Creating and Modifying Gazetteer Lists
  13.2.2 ANNIE Gazetteer Editor
 13.3 OntoGazetteer
 13.4 Gaze Ontology Gazetteer Editor
  13.4.1 The Gaze Gazetteer List and Mapping Editor
  13.4.2 The Gaze Ontology Editor
 13.5 Hash Gazetteer
  13.5.1 Prerequisites
  13.5.2 Parameters
 13.6 Flexible Gazetteer
 13.7 Gazetteer List Collector
 13.8 OntoRoot Gazetteer
  13.8.1 How Does it Work?
  13.8.2 Initialisation of OntoRoot Gazetteer
  13.8.3 Simple steps to run OntoRoot Gazetteer
 13.9 Large KB Gazetteer
  13.9.1 Quick usage overview
  13.9.2 Dictionary setup
  13.9.3 Additional dictionary configuration
  13.9.4 Dictionary for Gazetteer List Files
  13.9.5 Processing Resource Configuration
  13.9.6 Runtime configuration
  13.9.7 Semantic Enrichment PR
 13.10 The Shared Gazetteer for multithreaded processing
14 Working with Ontologies
 14.1 Data Model for Ontologies
  14.1.1 Hierarchies of Classes and Restrictions
  14.1.2 Instances
  14.1.3 Hierarchies of Properties
  14.1.4 URIs
 14.2 Ontology Event Model
  14.2.1 What Happens when a Resource is Deleted?
 14.3 The Ontology Plugin
  14.3.1 Upgrading from previous versions of GATE
  14.3.2 The OWLIMOntology Language Resource
  14.3.3 The ConnectSesameOntology Language Resource
  14.3.4 The CreateSesameOntology Language Resource
  14.3.5 The OWLIM2 Backwards-Compatible Language Resource
  14.3.6 Using Ontology Import Mappings
  14.3.7 Using BigOWLIM
  14.3.8 The sesameCLI command line interface
 14.4 GATE Ontology Editor
 14.5 Ontology Annotation Tool
  14.5.1 Viewing Annotated Text
  14.5.2 Editing Existing Annotations
  14.5.3 Adding New Annotations
  14.5.4 Options
 14.6 Relation Annotation Tool
  14.6.1 Description of the two views
  14.6.2 Create new annotation and instance from text selection
  14.6.3 Create new annotation and add label to existing instance from text selection
  14.6.4 Create and set properties for annotation relation
  14.6.5 Delete instance, label or property
  14.6.6 Differences with OAT and Ontology Editor
 14.7 Using the ontology API
 14.8 Ontology-Aware JAPE Transducer
 14.9 Annotating Text with Ontological Information
 14.10 Populating Ontologies
15 Non-English Language Support
 15.1 Language Identification
  15.1.1 Fingerprint Generation
 15.2 French Plugin
 15.3 German Plugin
 15.4 Romanian Plugin
 15.5 Arabic Plugin
 15.6 Chinese Plugin
  15.6.1 Chinese Word Segmentation
 15.7 Hindi Plugin
 15.8 Russian Plugin
 15.9 Bulgarian Plugin
 15.10 Danish Plugin
 15.11 Welsh Plugin
16 Domain Specific Resources
 16.1 Biomedical Support
  16.1.1 ABNER
  16.1.2 MetaMap
  16.1.3 GSpell biomedical spelling suggestion and correction
  16.1.4 BADREX
  16.1.5 MiniChem/Drug Tagger
  16.1.6 AbGene
  16.1.7 GENIA
  16.1.8 Penn BioTagger
  16.1.9 MutationFinder
17 Tools for Social Media Data
 17.1 Tools for Twitter
 17.2 Twitter JSON format
  17.2.1 Entity annotations in JSON
 17.3 Exporting GATE documents as JSON
 17.4 Low-level PRs for Tweets
 17.5 Handling multi-word hashtags
 17.6 The TwitIE Pipeline
18 Parsers
 18.1 SUPPLE Parser
  18.1.1 Requirements
  18.1.2 Building SUPPLE
  18.1.3 Running the Parser in GATE
  18.1.4 Viewing the Parse Tree
  18.1.5 System Properties
  18.1.6 Configuration Files
  18.1.7 Parser and Grammar
  18.1.8 Mapping Named Entities
 18.2 Stanford Parser
  18.2.1 Input Requirements
  18.2.2 Initialization Parameters
  18.2.3 Runtime Parameters
19 Machine Learning
 19.1 Brief introduction to machine learning in GATE
20 Tools for Alignment Tasks
 20.1 Introduction
 20.2 The Tools
  20.2.1 Compound Document
  20.2.2 CompoundDocumentFromXml
  20.2.3 Compound Document Editor
  20.2.4 Composite Document
  20.2.5 DeleteMembersPR
  20.2.6 SwitchMembersPR
  20.2.7 Saving as XML
  20.2.8 Alignment Editor
  20.2.9 Saving Files and Alignments
  20.2.10 Section-by-Section Processing
21 Crowdsourcing Data with GATE
 21.1 The Basics
 21.2 Entity classification
  21.2.1 Creating a classification job
  21.2.2 Loading data into a job
  21.2.3 Importing the results
  21.2.4 Automatic adjudication
 21.3 Entity annotation
  21.3.1 Creating an annotation job
  21.3.2 Loading data into a job
  21.3.3 Importing the results
  21.3.4 Automatic adjudication
22 Combining GATE and UIMA
 22.1 Embedding a UIMA AE in GATE
  22.1.1 Mapping File Format
  22.1.2 The UIMA Component Descriptor
  22.1.3 Using the AnalysisEnginePR
 22.2 Embedding a GATE CorpusController in UIMA
  22.2.1 Mapping File Format
  22.2.2 The GATE Application Definition
  22.2.3 Configuring the GATEApplicationAnnotator
23 More (CREOLE) Plugins
 23.1 Verb Group Chunker
 23.2 Noun Phrase Chunker
  23.2.1 Differences from the Original
  23.2.2 Using the Chunker
 23.3 TaggerFramework
  23.3.1 TreeTagger—Multilingual POS Tagger
  23.3.2 GENIA and Double Quotes
 23.4 Chemistry Tagger
  23.4.1 Using the Tagger
 23.5 TextRazor Annotation Service
 23.6 Annotating Numbers
  23.6.1 Numbers in Words and Numbers
  23.6.2 Roman Numerals
 23.7 Annotating Measurements
 23.8 Annotating and Normalizing Dates
 23.9 Snowball Based Stemmers
  23.9.1 Algorithms
 23.10 GATE Morphological Analyzer
  23.10.1 Rule File
 23.11 Flexible Exporter
 23.12 Configurable Exporter
 23.13 Annotation Set Transfer
 23.14 Schema Enforcer
 23.15 Information Retrieval in GATE
  23.15.1 Using the IR Functionality in GATE
  23.15.2 Using the IR API
 23.16 WordNet in GATE
  23.16.1 The WordNet API
 23.17 Kea - Automatic Keyphrase Detection
  23.17.1 Using the ‘KEA Keyphrase Extractor’ PR
  23.17.2 Using Kea Corpora
 23.18 Annotation Merging Plugin
 23.19 Copying Annotations between Documents
 23.20 LingPipe Plugin
  23.20.1 LingPipe Tokenizer PR
  23.20.2 LingPipe Sentence Splitter PR
  23.20.3 LingPipe POS Tagger PR
  23.20.4 LingPipe NER PR
  23.20.5 LingPipe Language Identifier PR
 23.21 OpenNLP Plugin
  23.21.1 Init parameters and models
  23.21.2 OpenNLP PRs
  23.21.3 Obtaining and generating models
 23.22 Stanford CoreNLP
  23.22.1 Stanford Tagger
  23.22.2 Stanford Parser
  23.22.3 Stanford Named Entity Recognition
 23.23 Content Detection Using Boilerpipe
 23.24 Inter Annotator Agreement
 23.25 Schema Annotation Editor
 23.26 Coref Tools Plugin
 23.27 Pubmed Format
 23.28 MediaWiki Format
 23.29 Fast Infoset Document Format
 23.30 GATE JSON Document Format
 23.31 DataSift Document Format
 23.32 CSV Document Support
 23.33 TermRaider term extraction tools
  23.33.1 Termbank language resources
  23.33.2 Termbank Score Copier
  23.33.3 The PMI bank language resource
 23.34 Document Normalizer
 23.35 Developer Tools
 23.36 Linguistic Simplifier
 23.37 GATE-Time
  23.37.1 DCTParser
  23.37.2 HeidelTime
  23.37.3 TimeML Event Detection
IV  The GATE Family: Cloud, MIMIR, Teamware
24 GATE Cloud
 24.1 GATE Cloud services: an overview
 24.2 Using GATE Cloud services
 24.3 Annotation Jobs on GATE Cloud
  24.3.1 The Annotation Service Charges Explained
  24.3.2 Where to find more details
25 GATE Teamware: A Web-based Collaborative Corpus Annotation Tool
 25.1 Introduction
 25.2 Requirements for Multi-Role Collaborative Annotation Environments
  25.2.1 Typical Division of Labour
  25.2.2 Remote, Scalable Data Storage
  25.2.3 Automatic annotation services
  25.2.4 Workflow Support
 25.3 Teamware: Architecture, Implementation, and Examples
  25.3.1 Data Storage Service
  25.3.2 Annotation Services
  25.3.3 The Executive Layer
  25.3.4 The User Interfaces
 25.4 Practical Applications
26 GATE Mímir
Appendices
A Change Log
 A.1 Version 8.6.1 (January 2020)
 A.2 Version 8.6 (June 2019)
 A.3 Version 8.5.1 (June 2018)
 A.4 Version 8.5 (May 2018)
  A.4.1 For developers
 A.5 Version 8.4.1 (June 2017)
 A.6 Version 8.4 (February 2017)
  A.6.1 Java compatibility
 A.7 Version 8.3 (January 2017)
  A.7.1 Java compatibility
 A.8 Version 8.2 (May 2016)
  A.8.1 Java compatibility
 A.9 Version 8.1 (June 2015)
  A.9.1 New plugins and significant new features
  A.9.2 Library updates and bugfixes
  A.9.3 Tools for developers
 A.10 Version 8.0 (May 2014)
  A.10.1 Major changes
  A.10.2 Other new and improved plugins
  A.10.3 Bug fixes and other improvements
  A.10.4 For developers
 A.11 Version 7.1 (November 2012)
  A.11.1 New plugins
  A.11.2 Library updates
  A.11.3 GATE Embedded API changes
 A.12 Version 7.0 (February 2012)
  A.12.1 Major new features
  A.12.2 Removal of deprecated functionality
  A.12.3 Other enhancements and bug fixes
 A.13 Version 6.1 (April 2011)
  A.13.1 New CREOLE Plugins
  A.13.2 Other new features and improvements
 A.14 Version 6.0 (November 2010)
  A.14.1 Major new features
  A.14.2 Breaking changes
  A.14.3 Other new features and bugfixes
 A.15 Version 5.2.1 (May 2010)
 A.16 Version 5.2 (April 2010)
  A.16.1 JAPE and JAPE-related
  A.16.2 Other Changes
 A.17 Version 5.1 (December 2009)
  A.17.1 New Features
  A.17.2 JAPE improvements
  A.17.3 Other improvements and bug fixes
 A.18 Version 5.0 (May 2009)
  A.18.1 Major New Features
  A.18.2 Other New Features and Improvements
  A.18.3 Specific Bug Fixes
 A.19 Version 4.0 (July 2007)
  A.19.1 Major New Features
  A.19.2 Other New Features and Improvements
  A.19.3 Bug Fixes and Optimizations
 A.20 Version 3.1 (April 2006)
  A.20.1 Major New Features
  A.20.2 Other New Features and Improvements
  A.20.3 Bug Fixes
 A.21 January 2005
 A.22 December 2004
 A.23 September 2004
 A.24 Version 3 Beta 1 (August 2004)
 A.25 July 2004
 A.26 June 2004
 A.27 April 2004
 A.28 March 2004
 A.29 Version 2.2 – August 2003
 A.30 Version 2.1 – February 2003
 A.31 June 2002
B Version 5.1 Plugins Name Map
C Obsolete CREOLE Plugins
 C.1 Ontotext JapeC Compiler
 C.2 Google Plugin
 C.3 Yahoo Plugin
  C.3.1 Using the YahooPR
 C.4 Gazetteer Visual Resource - GAZE
  C.4.1 Display Modes
  C.4.2 Linear Definition Pane
  C.4.3 Linear Definition Toolbar
  C.4.4 Operations on Linear Definition Nodes
  C.4.5 Gazetteer List Pane
  C.4.6 Mapping Definition Pane
 C.5 Google Translator PR
D Design Notes
 D.1 Patterns
  D.1.1 Components
  D.1.2 Model, view, controller
  D.1.3 Interfaces
 D.2 Exception Handling
E Ant Tasks for GATE
 E.1 Declaring the Tasks
 E.2 The packagegapp task - bundling an application with its dependencies
  E.2.1 Introduction
  E.2.2 Basic Usage
  E.2.3 Handling Non-Plugin Resources
  E.2.4 Streamlining your Plugins
  E.2.5 Bundling Extra Resources
 E.3 The expandcreoles Task - Merging Annotation-Driven Config into creole.xml
F Named-Entity State Machine Patterns
 F.1 Main.jape
 F.2 first.jape
 F.3 firstname.jape
 F.4 name.jape
  F.4.1 Person
  F.4.2 Location
  F.4.3 Organization
  F.4.4 Ambiguities
  F.4.5 Contextual information
 F.5 name_post.jape
 F.6 date_pre.jape
 F.7 date.jape
 F.8 reldate.jape
 F.9 number.jape
 F.10 address.jape
 F.11 url.jape
 F.12 identifier.jape
 F.13 jobtitle.jape
 F.14 final.jape
 F.15 unknown.jape
 F.16 name_context.jape
 F.17 org_context.jape
 F.18 loc_context.jape
 F.19 clean.jape
G Part-of-Speech Tags used in the Hepple Tagger
H Copyright and Licence