Workshop on Persisting, Indexing and Querying Multi-Paradigm Text Models

Matrixware / IRF, Vienna, May 2008

Introduction

Consider the following three types of retrieval systems:

full-text-based, with boolean and proximity operators
annotation-based, with an underlying graph representation encoding structured information about text ranges
ontology-based, with hierarchical conceptual schemas plus concept instances from documents

Systems for high-value content retrieval are likely to combine elements of all three styles, which poses difficult problems of representation, persistence, indexing and querying. Current systems often combine three quite different engines, perhaps putting an augmented full-text index in Lucene, Terrier or Lemur, and an ontology in OWLIM or Sesame. Search over annotation graphs is not normally exposed to end-users at present, but we believe that information professionals will benefit from this type of facility in the near future. Antecedents in the research world include CWB (the Corpus Workbench), GATE's ANNIC (Annotations in Context), the BNC's SARA system. XML engines are certainly relevant, but a difficulty remains in expressing graph-structured data in a tree-oriented language.

This workshop is intended to cross-fertilise research streams from each of the three types and from the XML and database worlds, with a view to improving the state-of-the-art for multi-paradigm systems.

Thursday 15th May

The Problem (1)
- (9.30) Introduction (Hamish Cunningham, Sheffield)
- (10.00) Alexandria, a New Type of Patent Database (Mike Baycroft, Fairview)
The Full-Text Orthodoxy
- (11.00) MG4J (Eric Graf, Glasgow)
- (11.30) Coffee
- (12.00) Terrier (Gianni Amati, FUB/Glasgow)
- (1.00) Lunch
The Ontological Heresy
- (2.00) SAFE: a Semantic Annotation Factory Environment (Kalina Bontcheva, Ian Roberts, Sheffield)
Mixed-Mode Solutions (1)
- (3.00) XML Indexing in INEX (Norbert Fuhr, Essen-Duisburg)
- (4.00) Coffee
- (4.30) Discussion: Characteristics of Mixed-Mode Systems
- (5.30) Close

Friday 16th May

Summary
- (9.30) Recap on Day 1 (Hamish Cunningham, Sheffield)
The Problem (2)
- (10.00) Matrixware: a Brief Introduction (Francisco Webber, Matrixware)
- (10.30) The IRF: a Brief Introduction (John Tait, IRF)
Mixed-Mode Solutions (2)
- (11.00) KIM, OWLIM and SAR: a Mixed-Mode Front-End; a Scaleable OWL Store; Integrating Annotations (Atanas Kiryakov, OntoText)
- (11.30) Coffee
- (12.00) KIM, OWLIM and SAR - part 2 (Atanas Kiryakov, OntoText)
- (1.00) Lunch
Mixed-Mode Solutions (3)
- (2.00) SAM Annotation and Querying in ANNIC and beyond (Valentin Tablan, Sheffield)
- (3.00) An HTML-XML Search Engine with Information Extraction (Ralf Schenkel, MPG)
- (4.00) Coffee
- (4.30) Monet DB (Arjen de Vries, CWI)
- (5.30) Close

Speakers

Gianni Amati (Fondazione Ugo Bordoni/Uni Glasgow)
Mike Baycroft (Fairview Research)
Kalina Bontcheva (Uni Sheffield)
Hamish Cunningham (Uni Sheffield)
Norbert Fuhr (Uni Essen-Duisburg)
Eric Graf (Uni Glasgow)
Atanas Kiryakov (Ontotext)
Borislav Popov (Ontotext)
Ralf Schenkel (MPG)
Valentin Tablan (Uni Sheffield)
John Tait (IRF)
Arjen de Vries (ACM/CWI)
Francisco Webber (Matrixware)