GATE in SEKT

Semantic Knowledge Technologies and Language Computation

Summary

The vision of SEKT is to develop and exploit the knowledge technologies which underlie Next Generation Knowledge Management. The SEKT strategy is built around the synergy of complementary know-how in Ontology and Metadata Technology, Knowledge Discovery and Human Language Technology, along with major European ICT organisations. Specifically, SEKT will deliver software to: semi-automatically learn ontologies and extract metadata, and to maintain and evolve the ontologies and metadata over time; to provide knowledge access; besides middleware to effect integration of all the SEKT components.

SEKT is funded as an Integrated Project under European Commission 6th FP with a budget around 7.5M euros. It starts from January 2004 and runs for 3 years. SEKT is co-ordinated by Dr. John Davies, BT, UK.

Contact: Hamish Cunningham (PI).

Project Objectives:

The vision of this project is to develop and exploit the knowledge technologies which will underlie Next Generation Knowledge Management (NGKM). These NGKM systems will include means for automated knowledge extraction, knowledge packaging and delivering according to user profiles as well as semantic-based knowledge analyses and matching for user-driven knowledge push. A major barrier to a widespread use of such NGKM systems in industry and organizations arises from the necessary overhead of knowledge modelling and annotation.

SEKT will address these and other NGKM challenges by an interdisciplinary approach focussing on substantially reducing the overhead of knowledge modelling and annotation of sources. This will be done by integrating Ontology & Metadata Technology (OMT), Human Language Technology (HLT), and Knowledge Discovery (KD) into a uniform and scalable framework that supports the integrated learning and management of ontologies and metadata in a (semi-) automatic way.

Specifically, the use of ontologies and metadata underlies the SEKT components, and the whole approach; human language technology will be used to extract metadata; knowledge discovery will be used to semi-automatically learn and evolve ontologies.

Further RTD objectives of SEKT are:

Providing seamless context-aware access to knowledge, including innovative visualization techniques and context-aware personalised push services;
the integration of information from different, heterogeneous sources based on ontology mediation techniques;
the development of methods for reasoning under inconsistency;
the integration of knowledge management with normal business processes in a seamless way thus making knowledge input a side-effect of doing normal business tasks.

Our Role

In this project HLT has two key contributions that will both include the handling of multilinguality. In the first place it will be used to semantically annotate informal and unstructured knowledge. Thus, the automatic or semi-automatic extraction of metadata from legacy data will be achieved. Secondly natural language processing will be used to generate natural language based on formal knowledge (ontologies and metadata). Here, HLT will be strongly integrated with methods from KD and OMT. The ontologies that structure metadata are in many cases language-independent to a significant degree. SEKT will trial metadata generation methods based on Information Extraction, Content Extraction and other language analysis technology that is used in HLT for various languages and, similarly, prove the documentation of ontologies and metadata in practice (using Natural Language Generation).

All the HLT technology used and further developed in the project will be based on systems that have been proven in a large range of languages. For example, the extraction components were recently entered in the "TIDES Surprise Language Competition" (that measured the ability to port HLT systems to Hindi) with favourable results. All the main European languages, and others from Chinese to Bulgarian, have also been covered in various ways, and SEKT will include at least four languages directly in the case studies (English, German, French and Spanish).

More specifically we are working on:

Integration of Human Language Technology with Knowledge Discovery and Ontology and Metadata Technology.
Ontology-based Information Extraction for Metadata Generation.
Evaluation metrics and corpora for ontology-based Information Extraction.
Intelligent Knowledge Access via Natural Language Generation Techniques.
Open source tools for Human Language Technology for Knowledge Management.

Selected results

The CLIE movie shows how to

load CLIE (Controlled Language Information Extraction),
load a sample document in the controlled language,
run the CLIE application to generate a ontology from the text,
add more text and update the ontology,
identify and correct errors in the text, and
save and reload ontologies in various formats (.RDF, .XML, .OWL).

The OBIE movie shows how to

load an unannotated document and automatically annotate it with OBIE (Ontology-Based Information Extraction),
manually add, delete and change annotations using OCAT (Ontology Corpus Annotation Tool), which displays an ontology tree of annotations, and
send the document back to the trainer to improve the model incrementally.

This movie demonstrates that as the user keeps making corrections and sending them back to the trainer, the quality of the automatic annotation improves.