Invited talks programme
All invited talks are in MB 3.210, from 16:30-17:30 (45min talk, 15min discussions).
- Monday 30 August: Prof. Christopher Baker, University of New Brunswick, Canada
Title: Augmentations of Text Mining for Semantic Knowledge Discovery
Abstract: Enterprise Knowledge workers and scientists rely heavily on facts extracted from documents and seek to contextualize new insights with established knowledge. The development of pipelines for domain specific the natural language processing is a budding research area. Despite successes in information extraction further augmentations to text mined results are necessary. These include exposing extracted entities with semantic metadata from domain ontologies, the grounding of extracted entities to canonical forms in reference databases, and the scoring of candidate semantic assertions (relations between named entities) derived from extracted sentences. Faithfull representation of extracted information is of central importance for downstream knowledge discovery, storage in semantic repositories and brokering of knowledge as semantic services or semantic assistants. In this presentation I outline; (1) the use of Lipid Ontology for representing normalized lipid names and semantic relations derived from a corpus of abstracts on ovarian cancer and further illustrate semantic query with a visual query paradigm (2) the grounding of normalized mutation mentions to the correct residues on the corresponding protein sequence in the UniProt database (3) mutation impact extraction and grounding to Gene Ontology terms along with classification according to direction of impact (4) the scoring algorithm and threshold based assignment of object property assertions populated to OWL-DL ontology in the Telecom Product use case for support of Contact Centre workers in solving customer queries.(5) the deployment of semantic assistant firefox plugin to deliver grounded mutations and lipid functional group axioms to users browsing PubMed abstracts.
Slides: PDF
- Tuesday 31 August: Dr. Atefeh Farzindar, President at NLP Technologies Inc., Montreal.
Title: DECISIONEXPRESS™: a weekly bulletin of the latest case law built on GATE.
Abstract: DecisionExpress™, a weekly bulletin, is a current awareness tool developed for the distribution of recent decisions from Canadian courts and tribunals. We will present our work concerning the development of a new methodology for the automatic summarization of court decisions. The prototype was built on GATE, which determines the thematic structure of a decision, then identifies the relevant sentences for each theme. We will show how GATE quickly helped implement a large volume of semantic rules and extract the legal information in a user-friendly environment.
- Wednesday 1 Sept: René Witte, Concordia University, Semantic Software Lab
Title: NLP for the Masses: Integrating GATE with Desktop Clients
Abstract: Thanks to GATE (and its many plugins), many natural language processing applications can now be easily assembled out of existing building blocks. This allows for a proliferation of text mining pipelines, helping users overloaded with ever more information, e.g., through information extraction, automatic summarization, or question-answering. However, end users must be able to access these pipelines in way that seamlessly integrates with their tasks and workflows. And it seems that in this area not much progress has been achieved in recent years: few NLP services are available on today's desktop environments and their applications. This talk investigates the reasons behind this lack of NLP adoption and presents a novel way of bringing GATE pipelines to end users via "Semantic Assistants". This project aims to provide effective means for the integration of natural language processing services into existing applications, using an open service-oriented architecture for NLP services. Integrated into desktop applications, such as word processors, email clients, and software development environments, end users can now receive context-specific support for any task involving human language. The (open source) Semantic Assistants architecture allows to publish any existing GATE pipeline as a W3C Web service with a WSDL description and relies on OWL models for service description, composition, and execution.
Slides: PDF
- Thursday 2 Sept: Andrew Borthwick, Intelius
Title: Person Attribute Extraction using GATE at Intelius
Abstract: This talk will describe Intelius’ system for doing high-precision extraction of person attributes from web pages using the GATE (General Architecture for Text Engineering) toolkit. The system leverages GATE’s existing architecture for tokenization, sentence segmentation, and part of speech tagging. We then make significant modifications to named entity resolution, most notably by plugging in an external person name identifier. We use GATE’s default coreference engine, but add our own machine-learning-based pronominal coreferencer. Finally, we add in a very extensive set of regular expressions for doing person attribute extraction. Other key technologies include snippet identification, ignoring of junk text, and person image identification.