Monday: Phil Gooch
Annotating biomedical terms in GATE
Application of natural language processing (NLP) techniques for extracting named entities from biomedical literature typically involves the use of large-scale thesauri. However, relying on such data sources for entity extraction (as opposed to entity validation) can be problematic: 1) given the compositionality and complexity of biomedical terms, such thesauri cannot account for every possible term; 2) thesauri cannot account for context in a given text. This presentation outlines an approach to biomedical knowledge extraction using minimal, morpheme-based thesauri, lexical patterns and part-of-speech analysis within the open-source GATE framework. Some generic rules for biomedical named-entity recognition are presented, based on principles from cognitive science and linguistics, and utilising the compositional nature of medical terminology, often derived from Latin and Greek roots. 'Candidate' terms can then be classified and validated against existing open-source biomedical databases, such as the UMLS MetaMap.
Tuesday: Brian Davis
Experiences of applying GATE to Semantic Web Technologies
Semantic Lifting is the process of capturing the semantics of various types of (semi-)structured data and/or non-semantic metadata and translating such data into relations, attributes and concepts within an ontology. While strictly speaking lifting applies only to (semi-)structured sources, in practice it is often used to imply all sources of data including unstructured data sources. Human Language Technology (HLT) therefore acts as the missing link between such unstructured sources and Semantics. This presentation describes language engineering experiences of the Semantic Information systems and Language Engineering (SMILE) group at the Digital Enterprise Research Institute(DERI), Galway, Ireland, ranging from using GATE to embed language processing into semantic applications such as semantic email to applying GATE as an interface technology for ontology creation and management.
Wednesday: Dr. Dhaval Thakker
Information Extraction and Linked Data Cloud
In the media industry there is a great emphasis on providing descriptive metadata as part of the media assets to the consumers. Information extraction (IE) is considered an important tool for metadata generation process and its performance largely depend on the knowledge base it utilizes. The advances in the “Linked Data Cloud” research provide a great opportunity for generating such knowledge base that benefit from the participation of wider community. In this talk, I will discuss our experiences of utilizing Linked Data Cloud in conjunction with a GATE-based IE system.
Thursday: Matthew Petrillo
A Year of Living Dangerously: Commercial Manual Annotation using Teamware
Commercial needs and expectations require annotation project management practices that differ from experimental or academic research needs. We have been working for a year on distributed commercial annotation projects using successive versions of GATE Teamware. Along the way, we have learned many lessons, refined our practices, and (hopefully) proven the value of professional annotation project management for commercial needs. I will relate our experiences working with annotators in Asia, note differences between commercial and academic annotation, and provide examples of work and unanticipated trouble spots we've overcome (or not).