Log in Help
Print
Homereleasesgate-5.1-build3431-ALLdoctao 〉 splitap1.html
 

Appendix A
Change Log [#]

This chapter lists major changes to GATE in roughly chronological order by release. Changes in the documentation are also referenced here.

A.1 Version 5.1 (December 2009) [#]

Version 5.1 is a major increment with lots of new features and integration of a number of important systems from 3rd parties (e.g. LingPipe, OpenNLP, OpenCalais, a revised UIMA connector). We’ve stuck with the 5 series (instead of jumping to 6.0) because the core remains stable and backwards compatible.

Other highlights include:

A.1.1 New Features

LingPipe Support

LingPipe is a suite of Java libraries for the linguistic analysis of human language. We have provided a plugin called ‘LingPipe’ with wrappers for some of the resources available in the LingPipe library. For more details, see the section 19.15.

OpenNLP Support

OpenNLP provides tools for sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference. The tools use Maximum Entropy modelling. We have provided a plugin called ‘OpenNLP’ with wrappers for some of the resources available in the OpenNLP Tools library. For more details, see section 19.16.

OpenCalais Support

We added a new PR called ‘OpenCalais PR’. This will process a document through the OpenCalais service, and add OpenCalais entity annotations to the document. For more details, see Section 19.14.

Ontology API

The ontology API (package gate.creole.ontology has been changed, the existing ontology implementation based on Sesame1 and OWLIM2 (package gate.creole.ontology.owlim) has been moved into the plugin Ontology_OWLIM2. An upgraded implementation based on Sesame2 and OWLIM3 that also provides a number of new features has been added as plugin Ontology. See Section 14.12 for a detailed description of all changes.

Benchmarking Improvements

A number of improvements to the benchmarking support in GATE. JAPE transducers now log the time spent in individual phases of a multi-phase grammar and by individual rules within each phase. Other PRs that use JAPE grammars internally (the pronominal coreferencer, English tokeniser) log the time taken by their internal transducers. A reporting tool, called ‘Profiling Reports’ under the ‘Tools’ menu makes summary information easily available. For more details, see chapter 11.

GUI improvements

To deal with quality assurance of annotations, one component has been updated and two new components have been added. The annotation diff tool has a new mode to copy annotations to a consensus set, see section 10.2.1. An annotation stack view has been added in the document editor and it allows to copy annotations to a consensus set, see section 3.4.3. A corpus view has been added for all corpus to get statistics like precision, recall and F-measure, see section 10.3.

An annotation stack view has been added in the document editor to make easier to see overlapping annotations, see section 3.4.3.

ABNER Support

ABNER is A Biomedical Named Entity Recogniser, for finding entities such as genes in text. We have provided a plugin called ‘AbnerTagger’ with a wrapper for ABNER. For more details, see section 17.6.

Generic Tagger Support

A new plugin has been added to provide an easy route to integrate taggers with GATE. The Tagger_Framework plugin provides examples of incorporating a number of external taggers which should serve as a starting point for using other taggers. See Section 17.4 for more details.

Section-by-Section Processing

We have added a new PR called ‘Segment Processing PR’. As the name suggests this PR allows processing individual segments of a document independently of one other. For more details, please look at the section 16.2.8.

Application Composition

The gate.Controller implementations provided with the main GATE distribution now also implement the gate.ProcessingResource interface. This means that an application can now contain another application as one of its components.

Groovy Support

Groovy is a dynamic programming language based on Java. You can now use it as a scripting language for GATE, via the Groovy Console. For more details, see Section 7.15.

A.1.2 JAPE improvements

GATE now produces a warning when any Java right-hand-sides in JAPE rules make use of the deprecated annotations parameter. All bundled JAPE grammars have been updated to use the replacement inputAS and outputAS parameters as appropriate.

The new Imports: statement at the beginning of a JAPE grammar file can now be used to make additional Java import statements available to the Java RHS code, see 8.6.5.

The JAPE debugger has been removed. Debugging of JAPE has been made easier as stack traces now refer to the JAPE source file and line numbers instead of the generated Java source code.

The Montreal Transducer has been made obsolete.

A.1.3 Other improvements and bug fixes

Plugin names have been rationalised. Mappings exist so that existing applications will continue to work, but the new names should be used in the future. Plugin name mappings are given in Appendix B. Also, the Segmenter_Chinese plugin (used to be known as chineseSegmenter plugin) is now part of the Lang_Chinese plugin.

The User Guide has been amalgamated with the Programmer’s Guide; all material can now be found in the User Guide. The ‘How-To’ chapter has been converted into separate chapters for installation, GATE Developer and GATE Embedded. Other material has been relocated to the appropriate specialist chapter.

Made Mac OS launcher 64-bit compatible. See section 2.2.1 for details.

The UIMA integration layer (Chapter 18) has been upgraded to work with Apache UIMA 2.2.2.

Oracle and PostGreSQL are no longer supported.

The MIAKT Natural Language Generation plugin has been removed.

The Minorthird plugin has been removed. Minorthird has changed significantly since this plugin was written. We will consider writing an up-to-date Minorthird plugin in the future.

A new gazetteer, Large KB Gazetteer (in the plugin ‘Gazetteer_LKB’) has been added, see Section 13.9 for details.

gate.creole.tokeniser.chinesetokeniser.ChineseTokeniser and related resources under the plugins/ANNIE/tokeniser/chinesetokeniser folder have been removed. Please refer to the Lang_Chinese plugin for resources related to the Chinese language in GATE.

Added an isInitialised() method to gate.Gate().

Added a parameter to the chemistry tagger PR (section 17.5) to allow it to operate on annotation sets other than the default one.

Plus many more smaller bugfixes...

A.2 Version 5.0 (May 2009) [#]

Note: existing users – if you delete your user configuration file for any reason you will find that GATE Developer no longer loads the ANNIE plugin by default. You will need to manually select ‘load always’ in the plugin manager to get the old behaviour.

A.2.1 Major New Features

JAPE Language Improvements

Several new extensions to the JAPE language to support more flexible pattern matching. Full details are in Chapter 8 but briefly:

Some of these extensions are similar to, but not the same as, those provided by the Montreal Transducer plugin. If you are already familiar with the Montreal Transducer, you should first look at Section 8.11 which summarises the differences.

Resource Configuration via Java 5 Annotations

Introduced an alternative style for supplying resource configuration information via Java 5 annotations rather than in creole.xml. The previous approach is still fully supported as well, and the two styles can be freely mixed. See Section 4.7 for full details.

Ontology-Based Gazetteer

Added a new plugin ‘Gazetteer_Ontology_Based’, which contains OntoRoot Gazetteer – a dynamically created gazetteer which is, in combination with few other generic resources, capable of producing ontology-aware annotations over the given content with regards to the given ontology. For more details see Section 13.8.

Inter-Annotator Agreement and Merging

New plugins to support tasks involving several annotators working on the same annotation task on the same documents. The plugin ‘Inter_Annotator_Agreement’ (Section 10.5) computes inter-annotator agreement scores between the annotators, the ‘Copy_Annots_Between_Docs’ plugin (Section 19.13) copies annotations from several parallel documents into a single master document, and the ‘Annotation_Merging’ plugin (Section 19.11) merges annotations from multiple annotators into a single ‘consensus’ annotation set.

Packaging Self-Contained Applications for GATE Teamware

Added a mechanism to assemble a saved GATE application along with all the resource files it uses into a single self-contained package to run on another machine (e.g. as a service in GATE Teamware). This is available as a menu option (Section 3.8.4) which will work for most common cases, but for complex cases you can use the underlying Ant task described in Section E.2.

GUI Improvements

A.2.2 Other New Features and Improvements

A.2.3 Specific Bug Fixes

Plus many more minor bug fixes

A.3 Version 4.0 (July 2007) [#]

A.3.1 Major New Features

ANNIC

ANNotations In Context: a full-featured annotation indexing and retrieval system designed to support corpus querying and JAPE rule authoring. It is provided as part of an extension of the Serial Datastores, called Searchable Serial Datastore (SSD). See Section 9 for more details.

New Machine Learning API

A brand new machine learning layer specifically targetted at NLP tasks including text classification, chunk learning (e.g. for named entity recognition) and relation learning. See Chapter 15 for more details.

Ontology API

A new ontology API, based on OWL In Memory (OWLIM), which offers a better API, revised ontology event model and an improved ontology editor to name but few. See Chapter 14 for more details.

OCAT

Ontology-based Corpus Annotation Tool to help annotators to manually annotate documents using ontologies. For more details please see Section 14.6.

Alignment Tools

A new set of components (e.g. CompoundDocument, AlignmentEditor etc.) that help in building alignment tools and in carrying out cross-document processing. See Chapter 16 for more details.

New HTML Parser

A new HTML document format parser, based on Andy Clark’s NekoHTML. This parser is much better than the old one at handling modern HTML and XHTML constructs, JavaScript blocks, etc., though the old parser is still available for existing applications that depend on its behaviour.

Java 5.0 Support

GATE now requires Java 5.0 or later to compile and run. This brings a number of benefits:

A.3.2 Other New Features and Improvements

A.3.3 Bug Fixes and Optimizations

And as always there are many smaller bugfixes too numerous to list here...

A.4 Version 3.1 (April 2006)

A.4.1 Major New Features

Support for UIMA

UIMA (http://www.research.ibm.com/UIMA/) is a language processing framework developed by IBM. UIMA and GATE share some functionality but are complementary in most respects. GATE now provides an interoperability layer to allow UIMA applications to include GATE components in their processing and vice-versa. For full information, see Chapter18.

New Ontology API

The ontology layer has been rewritten in order to provide an abstraction layer between the model representation and the tools used for input and output of the various representation formats. An implementation that uses Jena 2 (http://jena.sourceforge.net/ontology) for reading and writing OWL and RDF(S) is provided.

Ontotext Japec Compiler

Japec is a compiler for JAPE grammars developed by Ontotext Lab. It has some limitations compared to the standard JAPE transducer implementation, but can run JAPE grammars up to five times as fast. By default, GATE still uses the stable JAPE implementation, but if you want to experiment with Japec, see Section 19.10.

A.4.2 Other New Features and Improvements

A.4.3 Bug Fixes

A.5 January 2005

Release of version 3.

New plugins for processing in various languages (see 19.1). These are not full IE systems but are designed as starting points for further development (French, German, Spanish, etc.), or as sample or toy applications (Cebuano, Hindi, etc.).

Other new plugins:

Support for SVM Light, a support vector machine implementation, has been added to the machine learning plugin ‘Learning’ (see section 15.3.5).

A.6 December 2004

GATE no longer depends on the Sun Java compiler to run, which means it will now work on any Java runtime environment of at least version 1.4. JAPE grammars are now compiled using the Eclipse JDT Java compiler by default.

A welcome side-effect of this change is that it is now much easier to integrate GATE-based processing into web applications in Tomcat. See Section 7.14 for details.

A.7 September 2004

GATE applications are now saved in XML format using the XStream library, rather than by using native java serialization. On loading an application, GATE will automatically detect whether it is in the old or the new format, and so applications in both formats can be loaded. However, older versions of GATE will be unable to load applications saved in the XML format. (A java.io.StreamCorruptedException: invalid stream header exception will occcur.) It is possible to get new versions of GATE to use the old format by setting a flag in the source code. (See the Gate.java file for details.) This change has been made because it allows the details of an application to be viewed and edited in a text editor, which is sometimes easier than loading the application into GATE.

A.8 Version 3 Beta 1 (August 2004)

Version 3 incorporates a lot of new functionality and some reorganisation of existing components.

Note that Beta 1 is feature-complete but needs further debugging (please send us bug reports!).

Highlights include: completely rewritten document viewer/editor; extensive ontology support; a new plugin management system; separate .jar files and a Tomcat classloading fix; lots more CREOLE components (and some more to come soon).

Almost all the changes are backwards-compatible; some recent classes have been renamed (particularly the ontologies support classes) and a few events added (see below); datastores created by version 3 will probably not read properly in version 2. If you have problems use the mailing list and we’ll help you fix your code!

The gorey details:

A.9 July 2004

GATE documents now fire events when the document content is edited. This was added in order to support the new facility of editing documents from the GUI. This change will break backwards compatibility by requiring all DocumentListener implementations to implement a new method:
public void contentEdited(DocumentEvent e);

A.10 June 2004

A new algorithm has been implemented for the AnnotationDiff function. A new, more usable, GUI is included, and an ‘Export to HTML’ option added. More details about the AnnotationDiff tool are in Section 10.2.1.

A new build process, based on ANT (http://ant.apache.org/) is now available. The old build process, based on make, is now unsupported. See Section 2.5 for details of the new build process.

A Jape Debugger from Ontos AG has been integrated. You can turn integration ON with command line option ‘-j’. If you run GATE Developer with this option, the new menu item for Jape Debugger GUI will appear in the Tools menu. The default value of integration is OFF. We are currently awaiting documentation for this.

NOTE! Keep in mind there is ClassCastException if you try to debug ConditionalCorpusPipeline. Jape Debugger is designed for Corpus Pipeline only. The Ontos code needs to be changed to allow debugging of ConditionalCorpusPipeline.

A.11 April 2004

There are now two alternative strategies for ontology-aware grammar transduction:

The changes are in:

More information about the ontology-aware transducer can be found in Section 14.9.

A morphological analyser PR has been added. This finds the root and affix values of a token and adds them as features to that token.

A flexible gazetteer PR has been added. This performs lookup over a document based on the values of an arbitrary feature of an arbitrary annotation type, by using an externally provided gazetteer. See 13.6 for details.

A.12 March 2004

Support was added for the MAXENT machine learning library. (See 15.3.4 for details.)

A.13 Version 2.2 – August 2003

Note that GATE 2.2 works with JDK 1.4.0 or above. Version 1.4.2 is recommended, and is the one included with the latest installers.

GATE has been adapted to work with Postgres 7.3. The compatibility with PostgreSQL 7.2 has been preserved.

Note that as of Version 5.1 PostgreSQL is no longer supported.

New library version – Lucene 1.3 (rc1)

A bug in gate.util.Javac has been fixed in order to account for situations when String literals require an encoding different from the platform default.

Temporary .java files used to compile JAPE RHS actions are now saved using UTF-8 and the ‘-encoding UTF-8’ option is passed to the javac compiler.

A custom tools.jar is no longer necessary

Minor changes have been made to the look and feel of GATE Developer to improve its appearance with JDK 1.4.2

Some bug fixes (087, 088, 089, 090, 091, 092, 093, 095, 096 – see http://gate.ac.uk/gate/doc/bugs.html for more details).

A.14 Version 2.1 – February 2003

Integration of Machine Learning PR and WEKA wrapper (see Section 15.3).

Addition of DAML+OIL exporter.

Integration of WordNet (see Section 19.8).

The syntax tree viewer has been updated to fix some bugs.

A.15 June 2002

Conditional versions of the controllers are now available (see Section 3.7.1). These allow processing resources to be run conditionally on document features.

PostgreSQL Datastores are now supported.

These store data into a PostgreSQL RDBMS.

(As of Version 5.1 PostgreSQL is no longer supported.)

Addition of OntoGazetteer (see Section 13.3), an interface which makes ontologies visible within GATE Developer, and supports basic methods for hierarchy management and traversal.

Integration of Protégé, so that people with developed Protégé ontologies can use them within GATE.

Addition of IR facilities in GATE (see Section 19.4).

Modification of the corpus benchmark tool (see Section 10.4.3), which now takes an application as a parameter.

See also for details of other recent bug fixes.