GATE and Digital Libraries
As digital libraries grow in size and coverage, so does the need for automatic
content annotation and indexing. GATE's robust and customisable Named Entity
recognition and Information Extraction technology has already been used
successfully for metadata creation, automatic name and event annotation,
indexing, and access. So far, we (and our collaborators) have developed and
are developing various applications, each of which posed a unique challenge:
- The PrestoSpace project is
aiming to provide technical solutions and integrated systems for a complete
digital preservation of all kinds of audio-visual collections. Our role is
to develop language technology methods for (semi-)automatic creation of
metadata from multimedia content, building on our previous work in MUMIS
(see below).
- The GATE/ETCSL project is building
generic tools for linguistic annotation and Web based analysis of digital
libraries, using literary Sumerian as a testbed.
- The Perseus Digital Library at Tufts is
using GATE for enriching hypertextual models of cultural heritage corpora.
- The European Heritage On-Line (ECHO)
project is developing a model for European culture on the web. The GATE team
is represented on the technical board of ECHO and are working towards
transfer of advanced text processing tools to help produce a new model of
richly interlinked shared cultural materials.
- OldBaileyIE required adapting the language processing components to the
non-standard written conventions of Old English used in Old Bailey court
reports from the 17th Century;
- in MUMIS (Multimedia Indexing and Search) we dealt with annotating material
in multiple modalities to build a conceptual index of football videos;
- EMILLE focuses on collection and annotation of large text collections in
non-indigenous minority languages in the UK (including Urdu, Bengali,
Sylheti and others).
We are currently working on using GATE as the basis for the creation of
computational tools for the study of digital collections in cultural heritage
languages, such as Ancient Greek and Latin.