GATE in CLARIN

CLARIN

Common Language Resources and Technology Infrastructure

Summary

CLARIN is funded by the European Commission under FP7 from January 2009 to December 2011. It is coordinated by the University of Utrecht in the Netherlands and includes DFKI, MPI, the University of Sheffield, and 27 other partners, as well as many more member organizations.

Contact: Wim Peters (PI)

Project Objectives:

The CLARIN mission is to create an infrastructure which makes language resources (annotated recordings, texts, lexica, ontologies) and technology (speech recognizers, lemmatizers, parsers, summarizers, information extractors) available and readily usable to scholars of all disciplines, in particular the humanities and social sciences (HSS). In our age we are presented by many challenges as we deal with language in electronic formats, in spoken, written, and multimodal forms. The sheer size of this material makes the use of computer-aided methods indispensable for many scholars in the humanities and those in related fields who are concerned with linguistic material.

The CLARIN infrastructure is based on the firm belief that the days of pencil-and-paper research are numbered, even in the humanities. Computer-aided language processing is already used by a wide variety of sub-disciplines in the humanities and social sciences, addressing one or more of the multiple roles language plays, as carrier of cultural content and knowledge, instrument of communication, component of identity and object of study. However, achieving the advanced analysis of linguistic material with current resources requires an effort that no single humanities and social sciences scholar should be expected to make.

The cost of collecting, digitising and annotating large text or speech corpora, dictionaries or language descriptions is huge, and the creation of tools to manipulate these linguistic data is very demanding in terms of skills and expertise. The potential for researchers to gain the benefits of computer-enhanced language processing can only be achieved when a coordinated effort is invested creating a federation of existing archives and repositories in order to build a critical mass of resources, underpinned by an enabling infrastructure to provide access to tools and resources, along with the necessary training and advisory services.

Our Role We are providing standardized web services for NLP and IE tasks in various languages, described with CLARIN and OLAC harvestable metadata. See the Harvesting Day website for more details.

Software Our software page documents the web services and includes a reference client.

Selected Publications

A. Funk, I. Roberts, W. Peters, Implementing a Variety of Linguistic Annotations through a Common Web-Service Interface. In "Language Resource and Language Technology Standards" workshop at LREC, Malta, May 2010. PDF paper, slides

Funding:

Project Web page: http://www.clarin.eu/
Project Reference: INFRA-2007-2.2.1.2
Project Acronym: CLARIN
Project Name: Common Language Resources and Technology Infrastructure