Appendix A
Change Log [#]
This chapter lists major changes to GATE in roughly chronological order by release. Changes in the documentation are also referenced here.
A.1 Version 5.1 beta 1 (Autumn 2009) [#]
To get HTML reports from profiled processing resources, there is a new menu item in the ‘Tools’ menu called ‘Profiling reports’, see chapter 11.
To deal with quality assurance of annotations, one component has been updated and two new components have been added. The annotation diff tool has a new mode to copy annotations to a consensus set, see section 10.2.1. An annotation stack view has been added in the document editor and it allows to copy annotations to a consensus set, see section 3.4.3. A corpus view has been added for all corpus to get statistics like precision, recall and F-measure, see section 10.3.
An annotation stack view has been added in the document editor to make easier to see overlapping annotations, see section 3.4.3.
Added an isInitialised() method to gate.Gate().
The ontology API (package gate.creole.ontology has been changed, the existing ontology implementation based on Sesame1 and OWLIM2 (package gate.creole.ontology.owlim) has been moved into the plugin Ontology_OWLIM2. An upgraded implementation based on Sesame2 and OWLIM3 that also provides a number of new features has been added as plugin Ontology. See Section 14.12 for a detailed description of all changes.
The new Imports: statement at the beginning of a JAPE grammar file can now be used to make additional Java import statements available to the Java RHS code, see 8.6.5.
The User Guide has been amalgamated with the Programmer’s Guide; all material can now be found in the User Guide. The ‘How-To’ chapter has been converted into separate chapters for installation, GATE Developer and GATE Embedded. Other material has been relocated to the appropriate specialist chapter.
Plugin names have been rationalised. Mappings exist so that existing applications will continue to work, but the new names should be used in the future. Plugin name mappings are given in Appendix B.
The Montreal Transducer has been made obsolete.
The UIMA integration layer (Chapter 18) has been upgraded to work with Apache UIMA 2.2.2.
The JAPE debugger has been removed. Debugging of JAPE has been made easier as stack traces now refer to the JAPE source file and line numbers instead of the generated Java source code.
Oracle and PostGreSQL are no longer supported.
The MIAKT Natural Language Generation plugin has been removed.
The Minorthird plugin has been removed. Minorthird has changed significantly since this plugin was written. We will consider writing an up-to-date Minorthird plugin in the future.
A new gazetteer, Large KB Gazetteer (in the plugin ‘Gazetteer_LKB’) has been added, see Section 13.9 for details.
gate.creole.tokeniser.chinesetokeniser.ChineseTokeniser and related resources under the plugins/ANNIE/tokeniser/chinesetokeniser folder have been removed. Please refer to the Lang_Chinese plugin for resources related to the Chinese language in GATE.
A.2 July 2009 (FIG’09 Summer School) [#]
A number of projects took place as part of the FIG’09 summer school:
A.2.1 Benchmarking Improvements
A number of improvements to the benchmarking support in GATE. JAPE transducers now log the time spent in individual phases of a multi-phase grammar and by individual rules within each phase. Other PRs that use JAPE grammars internally (the pronominal coreferencer, English tokeniser) log the time taken by their internal transducers. A reporting tool, called ‘Profiling reports’ under the ‘Tools’ menu makes summary information easily available. For more details, see chapter 11.
A.2.2 Section-by-Section Processing
We have added a new PR called ‘Segment Processing PR’. As the name suggests this PR allows processing individual segments of a document independently of one other. For more details, please look at the section 16.2.8.
A.2.3 Application Compositing
The gate.Controller implementations provided with the main GATE distribution now also implement the gate.ProcessingResource interface. This means that an application can now contain another application as one of its components.
A.2.4 OpenCalais Support
We added a new PR called ‘OpenCalais PR’. This will process a document through the OpenCalais service, and add OpenCalais entity annotations to the document. For more details, see Section 19.14.
A.2.5 LingPipe Support
LingPipe is a suite of Java libraries for the linguistic analysis of human language. We have provided a plugin called ‘LingPipe’ with wrappers for some of the resources available in the LingPipe library. For more details, see the section 19.15.
A.2.6 OpenNLP Support
OpenNLP provides tools for sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference. The tools use Maximum Entropy modelling. We have provided a plugin called ‘OpenNLP’ with wrappers for some of the resources available in the OpenNLP Tools library. For more details, see section 19.16.
A.2.7 ABNER Support
ABNER is A Biomedical Named Entity Recogniser, for finding entities such as genes in text. We have provided a plugin called ‘AbnerTagger’ with a wrapper for ABNER. For more details, see section 17.6.
A.2.8 Groovy Support
Groovy is a dynamic programming language based on Java. You can now use it as a scripting language for GATE, via the Groovy Console. For more details, see Section 7.15.
A.2.9 Generic Tagger Support
A new plugin has been added to provide an easy route to integrate taggers with GATE. The Tagger_Framework plugin provides examples of incorporating a number of external taggers which should serve as a starting point for using other taggers. See Section 17.4 for more details.
A.3 Version 5.0 (May 2009) [#]
Note: existing users – if you delete your user configuration file for any reason you will find that GATE Developer no longer loads the ANNIE plugin by default. You will need to manually select ‘load always’ in the plugin manager to get the old behaviour.
A.3.1 Major New Features
JAPE Language Improvements
Several new extensions to the JAPE language to support more flexible pattern matching. Full details are in Chapter 8 but briefly:
- Negative constraints, that prevent a rule from matching if certain other annotations are present (Section 8.1.9).
- Additional matching operators for feature values, so you can now look for {Token.length < 5}, {Lookup.minorType != "ignore"}, etc. as well as simple equality (Section 8.2.2).
- ‘Meta-property’ accessors, see Section 8.1.4 to permit access to the string covered by an annotation, the length of the annotation, etc., e.g. {Lookup@length > 4}.
- Contextual operators, allowing you to search for one annotation contained within (or containing) another, e.g. {Sentence contains {Lookup.majorType == "location"}} (see Section 8.2.2).
- Additional Kleene operator for ranges, e.g. ({Token})[2,5] matches between 2 and 5 consecutive tokens, see Section 8.2.1.
- Additional operators can be added via runtime configuration (see Section 8.2.2).
Some of these extensions are similar to, but not the same as, those provided by the Montreal Transducer plugin. If you are already familiar with the Montreal Transducer, you should first look at Section 8.11 which summarises the differences.
Resource Configuration via Java 5 Annotations
Introduced an alternative style for supplying resource configuration information via Java 5 annotations rather than in creole.xml. The previous approach is still fully supported as well, and the two styles can be freely mixed. See Section 4.7 for full details.
Ontology-Based Gazetteer
Added a new plugin ‘Gazetteer_Ontology_Based’, which contains OntoRoot Gazetteer – a dynamically created gazetteer which is, in combination with few other generic resources, capable of producing ontology-aware annotations over the given content with regards to the given ontology. For more details see Section 13.8.
Inter-Annotator Agreement and Merging
New plugins to support tasks involving several annotators working on the same annotation task on the same documents. The plugin ‘Inter_Annotator_Agreement’ (Section 10.5) computes inter-annotator agreement scores between the annotators, the ‘Copy_Annots_Between_Docs’ plugin (Section 19.13) copies annotations from several parallel documents into a single master document, and the ‘Annotation_Merging’ plugin (Section 19.11) merges annotations from multiple annotators into a single ‘consensus’ annotation set.
Packaging Self-Contained Applications for GATE Teamware
Added a mechanism to assemble a saved GATE application along with all the resource files it uses into a single self-contained package to run on another machine (e.g. as a service in GATE Teamware). This is available as a menu option (Section 3.8.4) which will work for most common cases, but for complex cases you can use the underlying Ant task described in Section E.2.
GUI Improvements
- A new schema-driven tool to streamline manual annotation tasks (see Section 3.4.6).
- Context-sensitive help on elements in the resource tree and when pressing F1 key. Search in mailing list from the Help menu. Help is displayed in your browser or in a Java browser if you don’t have one.
- Improved search function inside documents with a regular expression builder. Search and replace annotation function in all annotation editors.
- Remember for each resource type the last path used when loading/saving a resource.
- Remember the last annotations selected in the annotation set view when you shift click on the annotation set view button.
- Improved context menu and when possible added drag and drop in: resource tree, annotation set view, annotation list view, corpus view, controller view. Context menu key can be now used if you have Java 1.6.
- New dialog box for error messages with user oriented messages, optional display of the configuration and proposing some useful actions. This will progressively replace the old stack trace dump into the message panel which is still here for the moment but should be hide by default in the future.
- Add read-only document mode that can be enable from the Options menu.
- Add a selection filter in the status bar of the annotations list table to easily select rows based on the text you enter.
- Add the last five applications loaded/saved in the context menu of the language resources in the resources tree.
- Display more informations on what going’s on in the waiting dialog box when running an application. The goal is to improve it to get a global progress bar and estimated time.
A.3.2 Other New Features and Improvements
- New parser plugins: A new plugin for the Stanford Parser (see Section 17.12) and a rewritten plugin for the RASP NLP tools (Section 17.10).
- A new sentence splitter, based on regular expressions, has been added to the ANNIE plugin. More details in Section 6.5.
- ‘Real-time’ corpus controller (Section 4.4), which terminates processing of a document if it takes longer than a configurable timeout..
- Major update to Annie OrthoMatcher coreference engine. Now correctly matches the sequence ‘David Jones ... David ... David Smith ... David’ as referring to two people. Also handles nicknames (David = Dave) via a new nickname list. Added optional parameter ‘highPrecisionOrgs’, which if set to true turns off riskier org matching rules. Many misc. bug fixes.
- Improved alignment editor (Chapter 16) with several advanced features and an API for adding your own actions to the editor.
- A new plugin for Chinese word segmentation, which is based on our work using machine learning algorithms for the Sighan-05 Chinese word segmentation task. It can learn a model from manually segmented text, and apply a learned model to segment Chinese text. In addition several learned models are available with the plugin, which can be used to segment text. For details about the plugin and those learned models see Section 19.12.
- New features in the ML API to produce an n-gram based language model from a corpus and a so-called ‘document-term matrix’ (see Section 19.4). Also introduced features to support active learning, a new learning algorithm (PAUM) and various optimisations including the ability to use an external executable for SVM training. Full details in Chapter 15.
- A new plugin to compute BDM scores for an ontology. The BDM score can be used to evaluate ontology based information extraction and classification. For details about the plugin see Section 10.6.
- Added new ‘getCovering’ method to AnnotationSet. This method returns annotations that completely span the provided range. An optional annotation type parameter can be provided to further limit the returned set.
- Complete redesign of ANNIC GUI. More details in Section 9.
A.3.3 Specific Bug Fixes
- HTML document format parser: several bugs fixed, including a null pointer exception if the document contained certain characters illegal in HTML (#1754749). Also, the HTML parser now respects the ‘Add space on markup unpack’ configuration option – previously it would always add space, even if the option was set to false.
- Fixed a severe performance bug in the Annie Pronominal Coreferencer resulting in a 50X speed improvement.
- JAPE did not always correctly handle the case when the input and output annotation sets for a transducer were different. This has now been fixed.
- ‘Save preserving document format’ was not correctly escaping ampersands and less than signs when two HTML entities are close together. Only the first one was replaced: A & B & C was output as A & B & C instead of A & B & C. This has now been fixed, and the fix is also valid for the flexible exporter but only if the standoff annotations parameter is set to false.
Plus many more minor bug fixes
A.4 Version 4.0 (July 2007) [#]
A.4.1 Major New Features
ANNIC
ANNotations In Context: a full-featured annotation indexing and retrieval system designed to support corpus querying and JAPE rule authoring. It is provided as part of an extention of the Serial Datastores, called Searchable Serial Datastore (SSD). See Section 9 for more details.
New Machine Learning API
A brand new machine learning layer specifically targetted at NLP tasks including text classification, chunk learning (e.g. for named entity recognition) and relation learning. See Chapter 15 for more details.
Ontology API
A new ontology API, based on OWL In Memory (OWLIM), which offers a better API, revised ontology event model and an improved ontology editor to name but few. See Chapter 14 for more details.
OCAT
Ontology-based Corpus Annotation Tool to help annotators to manually annotate documents using ontologies. For more details please see Section 14.6.
Alignment Tools
A new set of components (e.g. CompoundDocument, AlignmentEditor etc.) that help in building alignment tools and in carrying out cross-document processing. See Chapter 16 for more details.
New HTML Parser
A new HTML document format parser, based on Andy Clark’s NekoHTML. This parser is much better than the old one at handling modern HTML and XHTML constructs, JavaScript blocks, etc., though the old parser is still available for existing applications that depend on its behaviour.
Java 5.0 Support
GATE now requires Java 5.0 or later to compile and run. This brings a number of benefits:
- Java 5.0 syntax is now available on the right hand side of JAPE rules with the default Eclipse compiler. See Section D.5 for details.
- enum types are now supported for resource parameters. see Section 7.10 for details on defining the parameters of a resource.
- AnnotationSet and the CreoleRegister take advantage of generic types. The AnnotationSet interface is now an extension of Set<Annotation> rather than just Set, which should make for cleaner and more type-safe code when programming to the API, and the CreoleRegister now uses parameterized types, which are backwards-compatible but provide better type-safety for new code.
A.4.2 Other New Features and Improvements
- Hiding the view for a particular resource (by right clicking on its tab and selecting ‘Hide this view’) will now completely close the associated viewers and dispose them. Re-selecting the same resource at a later time will lead to re-creating the necessary viewers and displaying them. This has two advantages: firstly it offers a mechanism for disposing views that are not needed any more without actually closing the resource and secondly it provides a way to refresh the view of a resource in the situations where it becomes corrupted.
- The DataStore viewer now allows multiple selections. This lets users load or delete an arbitrarily large number of resources in one operation.
- The Corpus editor has been completely overhauled. It now allows re-ordering of documents as well as sorting the document list by either index or document name.
- Support has been added for resource parameters of type gate.FeatureMap, and it is also possible to specify a default value for parameters whose type is Collection, List or Set. See Section 7.3 for details.
- (Feature Request #1446642) After several requests, a mechanism has been added to allow overriding of GATE’s document format detection routine. A new creation-time parameter mimeType has been added to the standard document implementation, which forces a document to be interpreted as a specific MIME type and prevents the usual detection based on file name extension and other information. See Section 5.5.1 for details.
- A capability has been added to specify arbitrary sets of additional features on individual gazetteer entries. These features are passed forward into the Lookup annotations generated by the gazetteer. See Section 6.3 for details.
- As an alternative to the Google plugin, a new plugin called yahoo has been added to to allow users to submit their query to the Yahoo search engine and to load the found pages as GATE documents. See Section 19.7 for more details.
- It is now easier to run a corpus pipeline over a single document in the GATE Developer GUI – documents now provide a right-click menu item to create a singleton corpus containing just this document. See Section 3.3 for details.
- A new interface has been added that lets PRs receive notification at the start and end of execution of their containing controller. This is useful for PRs that need to do cleanup or other processing after a whole corpus has been processed. See Section 4.4 for details.
- The GATE Developer GUI does not call System.exit() any more when it is closed. Instead an effort is made to stop all active threads and to release all GUI resources, which leads to the JVM exiting gracefully. This is particularly useful when GATE is embedded in other systems as closing the main GATE window will not kill the JVM process any more.
- The set of AnnotationSchemas that used to be included in the core gate.jar and laoded as builtins have now been moved to the ANNIE plugin. When the plugin is loaded, the default annotation schemas are instantiated automatically and are available when doing manual annotation.
- There is now support in creole.xml files for automatically creating instances of a resource that are hidden (i.e. do not show in the GUI). One example of this can be seen in the creole.xml file of the ANNIE plugin where the default annotation schemas are defined.
- A couple of helper classes have been added to assist in using GATE within a Spring application. Section 7.13 explains the details.
- Improvements have been made to the thread-safety of some internal components, which mean that it is now safe to create resources in multiple threads (though it is not safe to use the same resource instance in more than one thread). This is a big advantage when using GATE in a multithreaded environment, such as a web application. See Section 7.12 for details.
- Plugins can now provide custom icons for their PRs and LRs in the plugin JAR file. See Section 7.10 for details.
- It is now possible to override the default location for the saved session file using a system property. See Section 2.3 for details.
- The TreeTagger plugin (‘Tagger_TreeTagger’) supports a system property to specify the location of the shell interpreter used for the tagger shell script. In combination with Cygwin this makes it much easier to use the tagger on Windows. See Section 17.3 for details.
- The Buchart plugin has been removed. It is superseded by SUPPLE, and instructions on how to upgrade your applications from Buchart to SUPPLE are given in Section 17.11. The probability finder plugin has also been removed, as it is no longer maintained.
- The bootstrap wizard now creates a basic plugin that builds with Ant. Since a Unix-style make command is no longer required this means that the generated plugin will build on Windows without needing Cygwin or MinGW.
- The GATE source code has moved from CVS into Subversion. See Section 2.2.3 for details of how to check out the code from the new repository.
- An optional parameter, keepOriginalMarkupsAS, has been added to the DocumentReset PR which allows users to decide whether to keep the Original Markups AS or not while reseting the document. See Section 6.1 for more details.
A.4.3 Bug Fixes and Optimizations
- The Morphological Analyser has been optimized. A new FSM based, although with minor alteration to the basic FSM algorithm, has been implemented to optimize the Morphological Analyser. The previous profiling figures show that the morpher when integrated with ANNIE application used to take upto 60% of the overall processing time. The optimized version only takes 7.6% of the total processing time. See Section 17.8 for more details on the morpher.
- The ANNIE Sentence Splitter was optimised. The new version is about twice as fast as the previous one. The actual speed increase varies widely depending on the nature of the document.
- The imlementation of the OrthoMatcher component has been improved. This resources takes significantly less time on large documents.
- The implementation of AnnotationSets has been improved. GATE now requires up to 40% less memory to run and is also 20% faster on average. The get methods of AnnotationSet return instances of ImmutableAnnotationSet. Any attempt at modifying the content of these objects will trigger an Exception. An empty ImmutableAnnotationSet is returned instead of null.
- The Chemistry tagger (Section 17.5) has been updated with a number of bugfixes and improvements.
- The Document user interface has been optimised to deal better with large bursts of events
which tend to occur when the document that is currently displayed gets modified. The main
advantages brought by this new implementation are:
- The document UI refreshes faster than before.
- The presence of the GUI for a document induces a smaller performance penalty than it used to. Due to a better threading implementation, machines benefiting from multiple CPUs (e.g. dual CPU, dual core or hyperthreading machines) should only see a negligible increase in processing time when a document is displayed compared to the situations where the document view is not shown. In the previous version, displaying a document while it was processed used to increase execution time by an order of magnitude.
- The GUI is more responsive now when a large number of annotations are displayed, hidden or deleted.
- The strange exceptions that used to occur occasionally while working with the document GUI should not happen any more.
And as always there are many smaller bugfixes too numerous to list here...
A.5 Version 3.1 (April 2006)
A.5.1 Major New Features
Support for UIMA
UIMA (http://www.research.ibm.com/UIMA/) is a language processing framework developed by IBM. UIMA and GATE share some functionality but are complementary in most respects. GATE now provides an interoperability layer to allow UIMA applications to include GATE components in their processing and vice-versa. For full information, see Chapter18.
New Ontology API
The ontology layer has been rewritten in order to provide an abstraction layer between the model representation and the tools used for input and output of the various representation formats. An implementation that uses Jena 2 (http://jena.sourceforge.net/ontology) for reading and writing OWL and RDF(S) is provided.
Ontotext Japec Compiler
Japec is a compiler for JAPE grammars developed by Ontotext Lab. It has some limitations compared to the standard JAPE transducer implementation, but can run JAPE grammars up to five times as fast. By default, GATE still uses the stable JAPE implementation, but if you want to experiment with Japec, see Section 19.10.
A.5.2 Other New Features and Improvements
- Addition of a new JAPE matching style ‘all’. This is similar to Brill, but once all rules from a given start point have matched, the matching will continue from the next offset to the current one, rather than from the position in the document where the longest match finishes. More details can be found in Section 8.4.
- Limited support for loading PDF and Microsoft Word document formats. Only the text is extracted from the documents, no formatting information is preserved.
- The Buchart parser has been deprecated and replaced by a new plugin called SUPPLE - the Sheffield University Prolog Parser for Language Engineering. Full details, including information on how to move your application from Buchart to SUPPLE, is in Section 17.11.
- The Hepple POS Tagger is now open-source. The source code has been included in the GATE Developer/Embedded distribution, under src/hepple/postag. More information about the POS Tagger can be found in Section 6.6.
- Minipar is now supported on Windows. minipar-windows.exe, a modified version of pdemo.cpp is added under the gate/plugins/Parser_Minipar directory to allow users to run Minipar on windows platform. While using Minipar on Windows, this binary should be provided as a value for miniparBinary parameter. For full information on Minipar in GATE, see Section 17.9.
- The XmlGateFormat writer(Save As Xml from GATE Developer GUI, gate.Document.toXml() from GATE Embedded API) and reader have been modified to write and read GATE annotation IDs. For backward compatibility reasons the old reader has been kept. This change fixes a bug which manifested in the following situation: If a GATE document had annotations carrying features of which values were numbers representing other GATE annotation IDs, after a save and a reload of the document to and from XML, the former values of the features could have become invalid by pointing to other annotations. By saving and restoring the GATE annotation ID, the former consistency of the GATE document is maintained. For more information, see Section 5.5.2.
- The NP chunker and chemistry tagger plugins have been updated. Mark Greenwood has relicenced them under the LGPL, so their source code has been moved into the GATE Developer/Embedded distribution. See Sections 17.2 and 17.5 for details.
- The Tree Tagger wrapper has been updated with an option to be less strict when characters that cannot be represented in the tagger’s encoding are encountered in the document. Details are in Section 17.3.
- JAPE Transducers can be serialized into binary files. The option to load serialized version of JAPE Transducer (an init-time parameter binaryGrammarURL) is also implemented which can be used as an alternative to the parameter grammarURL. More information can be found in Section 8.9.
- On Mac OS, GATE Developer now behaves more ‘naturally’. The application menu items and keyboard shortcuts for About and Preferences now do what you would expect, and exiting GATE Developer with command-Q or the Quit menu item properly saves your options and current session.
- Updated versions of Weka(3.4.6) and Maxent(2.4.0).
- Optimisation in gate.creole.ml: the conversion of AnnotationSet into ML examples is now faster.
- It is now possible to create your own implementation of Annotation, and have GATE use this instead of the default implementation. See AnnotationFactory and AnnotationSetImpl in the gate.annotation package for details.
A.5.3 Bug Fixes
- The Tree Tagger wrapper has been updated in order to run under Windows. See 17.3.
- The SUPPLE parser has been made more user-friendly. It now produces more helpful error messages if things go wrong. Note that you will need to update any saved applications that include SUPPLE to work with this version - see Section 17.11 for details.
- Miscellaneous fixes in the Ontotext JapeC compiler.
- Optimization : the creation of a Document is much faster.
- Google plugin: The optional pagesToExclude parameter was causing a NullPointerException when left empty at run time. Full details about the plugin functionality can be found in Section 19.6.
- Minipar, SUPPLE, TreeTagger: These plugins that call external processes have been fixed to cope better with path names that contain spaces. Note that some of the external tools themselves still have problems handling spaces in file names, but these are beyond our control to fix. If you want to use any of these plugins, be sure to read the documentation to see if they have any such restrictions.
- When using a non-default location for GATE configuration files, the configuration data is saved back to the correct location when GATE exits. Previously the default locations were always used.
- Jape Debugger: ConcurrentModificationException in JAPE debugger. The JAPE debugger was generating a ConcurrentModificationException during an attempt to run ANNIE. There is no exception when running without the debugger enabled. As result of fixing one unnesesary and incorrect callback to debugger was removed from SinglePhaseTransducer class.
- Plus many other small bugfixes...
A.6 January 2005
Release of version 3.
New plugins for processing in various languages (see 19.1). These are not full IE systems but are designed as starting points for further development (French, German, Spanish, etc.), or as sample or toy applications (Cebuano, Hindi, etc.).
Other new plugins:
- Chemistry Tagger 17.5
- Montreal Transducer (since retired)
- RASP Parser 17.10
- MiniPar 17.9
- Buchart Parser 17.11
- MinorThird (Version 5.1: removed)
- NP Chunker 17.2
- Stemmer 17.7
- TreeTagger 17.3
- Probability Finder
- Crawler 19.5
- Google PR 19.6
Support for SVM Light, a support vector machine implementation, has been added to the machine learning plugin ‘Learning’ (see section 15.3.5).
A.7 December 2004
GATE no longer depends on the Sun Java compiler to run, which means it will now work on any Java runtime environment of at least version 1.4. JAPE grammars are now compiled using the Eclipse JDT Java compiler by default.
A welcome side-effect of this change is that it is now much easier to integrate GATE-based processing into web applications in Tomcat. See Section 7.14 for details.
A.8 September 2004
GATE applications are now saved in XML format using the XStream library, rather than by using native java serialization. On loading an application, GATE will automatically detect whether it is in the old or the new format, and so applications in both formats can be loaded. However, older versions of GATE will be unable to load applications saved in the XML format. (A java.io.StreamCorruptedException: invalid stream header exception will occcur.) It is possible to get new versions of GATE to use the old format by setting a flag in the source code. (See the Gate.java file for details.) This change has been made because it allows the details of an application to be viewed and edited in a text editor, which is sometimes easier than loading the application into GATE.
A.9 Version 3 Beta 1 (August 2004)
Version 3 incorporates a lot of new functionality and some reorganisation of existing components.
Note that Beta 1 is feature-complete but needs further debugging (please send us bug reports!).
Highlights include: completely rewritten document viewer/editor; extensive ontology support; a new plugin management system; separate .jar files and a Tomcat classloading fix; lots more CREOLE components (and some more to come soon).
Almost all the changes are backwards-compatible; some recent classes have been renamed (particularly the ontologies support classes) and a few events added (see below); datastores created by version 3 will probably not read properly in version 2. If you have problems use the mailing list and we’ll help you fix your code!
The gorey details:
- Anonymous CVS is now available. See Section 2.2.3 for details.
- CREOLE repositories and the components they contain are now managed as plugins. You can select the plugins the system knows about (and add new ones) by going to ‘Manage CREOLE Plugins’ on the file menu.
- The gate.jar file no longer contains all the subsiduary libraries and CREOLE component resources. This makes it easier to replace library versions and/or not load them when not required (libraries used by CREOLE builtins will now not be loaded unless you ask for them from the plugins manager console).
- ANNIE and other bundled components now have their resource files (e.g. pattern files, gazetteer lists) in a separate directory in the distribution – gate/plugins.
- Some testing with Sun’s JDK 1.5 pre-releases has been done and no problems reported.
- The gate:// URL system used to load CREOLE and ANNIE resources in past releases is no longer needed. This means that loading in systems like Tomcat is now much easier.
- MAC OS X is now properly supported by the installed and the runtime.
- An Ontology-based Corpus Annotation Tool (OCAT) has been implemented as a plugin. Documentation of its functionality is in Section 14.6.
- The NLG Lexical tools from the MIAKT project have now been released.
- The Features viewer/editor has been completely updated – see Section 3.4.5 for details.
- The Document editor has been completely rewritten – see Section 3.2 for more information.
- The datastore viewer is now a full-size VR – see Section 3.8.2 for more information.
A.10 July 2004
GATE documents now fire events when the document content is edited. This was added in order to
support the new facility of editing documents from the GUI. This change will break backwards
compatibility by requiring all DocumentListener implementations to implement a new
method:
public void contentEdited(DocumentEvent e);
A.11 June 2004
A new algorithm has been implemented for the AnnotationDiff function. A new, more usable, GUI is included, and an ‘Export to HTML’ option added. More details about the AnnotationDiff tool are in Section 10.2.1.
A new build process, based on ANT (http://ant.apache.org/) is now available. The old build process, based on make, is now unsupported. See Section 2.5 for details of the new build process.
A Jape Debugger from Ontos AG has been integrated. You can turn integration ON with command line option ‘-j’. If you run GATE Developer with this option, the new menu item for Jape Debugger GUI will appear in the Tools menu. The default value of integration is OFF. We are currently awaiting documentation for this.
NOTE! Keep in mind there is ClassCastExceprion if you try to debug ConditionalCorpusPipeline. Jape Debugger is designed for Corpus Pipeline only. The Ontos code needs to be changed to allow debugging of ConditionalCorpusPipeline.
A.12 April 2004
There are now two alternative strategies for ontology-aware grammar transduction:
- using the [ontology] feature both in grammars and annotations; with the default Transducer.
- using the ontology aware transducer – passing an ontology LR to a new subsume method in the SimpleFeatureMapImpl. the latter strategy does not check for ontology features (this will make the writing of grammars easier – no need to specify ontology).
The changes are in:
- SinglePhaseTransducer (always call subsume with ontology – if null then the ordinary subsumption takes place)
- SimpleFeatureMapImpl (new subsume method using an ontology LR)
More information about the ontology-aware transducer can be found in Section 14.9.
A morphological analyser PR has been added. This finds the root and affix values of a token and adds them as features to that token.
A flexible gazetteer PR has been added. This performs lookup over a document based on the values of an arbitrary feature of an arbitrary annotation type, by using an externally provided gazetteer. See 13.6 for details.
A.13 March 2004
Support was added for the MAXENT machine learning library. (See 15.3.4 for details.)
A.14 Version 2.2 – August 2003
Note that GATE 2.2 works with JDK 1.4.0 or above. Version 1.4.2 is recommended, and is the one included with the latest installers.
GATE has been adapted to work with Postgres 7.3. The compatibility with PostgreSQL 7.2 has been preserved.
Note that as of Version 5.1 PostgreSQL is no longer supported.
New library version – Lucene 1.3 (rc1)
A bug in gate.util.Javac has been fixed in order to account for situations when String literals require an encoding different from the platform default.
Temporary .java files used to compile JAPE RHS actions are now saved using UTF-8 and the ‘-encoding UTF-8’ option is passed to the javac compiler.
A custom tools.jar is no longer necessary
Minor changes have been made to the look and feel of GATE Developer to improve its appearance with JDK 1.4.2
Some bug fixes (087, 088, 089, 090, 091, 092, 093, 095, 096 – see http://gate.ac.uk/gate/doc/bugs.html for more details).
A.15 Version 2.1 – February 2003
Integration of Machine Learning PR and WEKA wrapper (see Section 15.3).
Addition of DAML+OIL exporter.
Integration of WordNet (see Section 19.8).
The syntax tree viewer has been updated to fix some bugs.
A.16 June 2002
Conditional versions of the controllers are now available (see Section 3.7.1). These allow processing resources to be run conditionally on document features.
PostgreSQL Data Stores are now supported.
These store data into a PostgreSQL RDBMS.
(As of Version 5.1 PostgreSQL is no longer supported.)
Addition of OntoGazetteer (see Section 13.3), an interface which makes ontologies visible within GATE Developer, and supports basic methods for hierarchy management and traversal.
Integration of Protégé, so that people with developed Protégé ontologies can use them within GATE.
Addition of IR facilities in GATE (see Section 19.4).
Modification of the corpus benchmark tool (see Section 10.4.1), which now takes an application as a parameter.
See also for details of other recent bug fixes.