GATE Training Modules
Choosing the right track
Each track runs for the length of the course. Tracks 1, 2, and 3 run in parallel Monday to Thursday, one module per day. On Friday there are now 2 optional add-on courses run in parallel. You can choose between: semantic technology and Linked Open Data and Social Media Mining with GATE.
You should normally choose just one track from 1 to 3 for the main course. In some cases, the modules within a track follow directly from previous modules in that track (especially in Track 1). In general, we advise against mixing and matching modules from the different main tracks. If you have specific requirements for this, or if you are unsure of which track is best for you, please contact us directly before registering, to discuss your needs, on gate-fig@sheffield.ac.uk
Track 1: Introduction to GATE and Text Mining
This is the best place to start if you are completely new to GATE. This track covers the following topics;
- Module 1: Introduction to GATE Developer
- GATE concepts
- Finding your way around the GUI
- Loading and using existing processing resources and plugins
- Loading, annotating and viewing existing language resources
- Creating and using applications
- Module 2: Information Extraction and ANNIE
- Basic introduction to Information Extraction
- Running and evaluating an information extraction project
- Using and customising ANNIE, GATE's IE tool
- Using the Corpus QA and other evaluation tools
- Introduction to semantic annotation with ontologies
- Module 3: Introduction to JAPE
- Using JAPE grammars for annotation manipulation
- Using JAPE for named entity recognition
- Module 4: Introduction to Teamware, GateCloud, and Mímir
- Teamware: A web-based collaborative corpus annotation tool
- GateCloud: Low-cost and low-effort scalability of information extraction to terabytes of text
- Mímir: Semantic Indexing and search over text, annotations, and ontologies
Prerequisites: There are no prerequisites, though it might be a good idea to run through some GATE Developer tutorials prior to attending, to make the most of the experience.
Track 2: Programming in GATE
This track is appropriate for those interested in using GATE mainly from the API (GATE Embedded). Topics include the following;
- Module 5: GATE Embedded API
- The GATE component model (LRs, PRs, VRs, GATE Factory)
- The GATE data model (annotations, documents, corpora)
- Execution control (controllers, application persistence, compositionality)
- Module 6: Main GATE APIs
- Advanced JAPE: using Java on the RHS
- The Ontology API, and Ontology Population
- Module 7: Creating new Resource Types
- Writing new Processing Resources
- Writing new Visual Resources
- Understanding CREOLE configuration
- Module 8: Advanced GATE Embedded
- GATE and UIMA
- GATE-based Web Applications
- Groovy in GATE
Prerequisites: Ability to program using Java. Although Track 1 is not a prerequisite for taking Track 2, it can be a useful progression. Alternatively, those who are new to GATE might want to spend some time looking at the learning materials on this site, for example the Matrixware tutorials, before attending Track 2.
Track 3: Advanced GATE
This track introduces more advanced functionality within GATE. Unlike Track 2, however, it does not depend on any Java programming skills;
- Module 9: Ontologies and Semantic Annotation
- Introduction to Ontologies
- GATE Ontology Editor
- GATE Ontology Annotation Tools for Entities and Relations
- Automatic Semantic Annotation in GATE
- Measuring Performance
- Using the Large Knowledge Base gazetteer (LKB)
- Using MIMIR: the new semantic indexing and search platform
- Module 10: Advanced GATE Applications
- Customising ANNIE
- Working with different languages
- Complex applications
- Conditional Processing
- Section-by-section processing
- Module 11: Machine Learning
- Machine learning and evaluation concepts
- Using ML in GATE
- Engines and algorithms)
- Entity learning hands-onl session
- Relation extraction hands-on session
- Module 12: Opinion Mining
- Introduction to opinion mining and sentiment analysis
- Using GATE tools to perform sentiment analysis
- Machine learning for sentiment analysis hands-on session
- Future directions for opinion mining
Prerequisites: Familiarity with the topics covered in Track 1.
Add-on Course 1: Semantic Technology and Linked Data Annotation
This module is provided in collaboration with the project.
- Module 13: Semantic Technology and Linked Open Data: Basics, Tools, and Applications
- Introduction to RDF, OWL, and Linked Open Data
- Semantic Annotation with Linked Data
- Semantic Search
Prerequisites: No specific prerequisites
Add-on Course 2: Social Media Mining with GATE
- Module 14: Mining Twitter, Facebook, and other social media with GATE: Basics, Tools, and Applications
- Processing tweets: dealing with noise, hashtags, mentions, re-tweets, etc
- Normalisation, language identification, tokenisation, POS tagging, and NER for social media
- It's all about the networks: exploiting the social connections and user interactions
- Delivered by Leon Derczynski and Kalina Bontcheva
Prerequisites: Basic familiarity with Twitter, Facebook, and other social media; ideally some knowledge of JSON
Research Experience Talks
The research talks are scheduled to allow participants from all tracks to come together and learn about the kind of projects being done with GATE. Talks will vary between FIGs but the kinds of topics covered in the past have been as follows:
- GATECloud: Text mining on the Cloud
- Text analysis for business intelligence
- Biomedical text mining
- Large-scale patent processing
- Ontologies and semantic annotation
- Summarisation of legal information
- Deriving personal profiles from biographical text