GATE Projects

This page lists some of the projects involving GATE which are currently being undertaken, either by our team or outside Sheffield.

(This page is almost always out of date, as new projects start all the time.)

Projects with Sheffield involvement

[Language Technologies/Social Media | #INT_DLs, Digital Libraries/Corpus Annotation/Crowdsourcing) | E-science/Grid/Cloud | Semantic Web/Knowledge Technologies]

*** Language Technologies and Social Media Mining**
COMRADES EC-funded project (Jan 2016 - Dec 2018), coordinated by KMI, Open University, UK	Analysis of social media communication around disasters and humanitarian relief, including veracity and informativeness issues.
Political Futures Tracker Nesta-funded project (Nov 2014 - May 2015)	Analysis of political tweets and other texts in the run-up to the 2015 UK General Election
PHEME EC-funded project (Jan 2014 - Dec 2016), coordinated by Kalina Bontcheva from the GATE team	Identifying and tracking rumours as they spread through social media.
TrendMiner DFKI, University of Sheffield, Ontotext AD, University of Southampton, Stichting Internet Memory Foundation, Eurokleis S.R.L., Sora Ogris & Hofinger GMBH, Hardik Fintrade Pvt Ltd.	Innovative, portable open-source real-time methods for cross-lingual mining and summarisation of large-scale stream media.
uComp EPSRC-funded CHIST-ERA project (Nov.2012 - Nov.2015), led by University of Sheffield	Embedded Human Computation for Knowledge Extraction and Evaluation.
DecarboNet EC-funded project (Oct.2013 - Sep.2016), led by the Knowledge Media Institute	We are building a decarbonisation platform for translating collective awareness of climate change into behavioural change. Our role is in mining social media: named entities, events, opinions, and controversies, using Linked Open Data.
MUSE University of Sheffield	Named entity recognition from diverse text types and genres.
OLLIE University of Sheffield	An adaptive Information Extraction tool that uses GATE's open-source machine learning tools and allows users to train the system collaboratively by annotating a shared corpus in a Web browser
AMITIES University of Sheffield, CNRS-LIMSI, GE Service Centre GMBH, VECSYS, VIEL & CIE, State University of New York Duke University, GE Research & Development	Building empirically induced dialogue processors to support multilingual human-computer interaction.
Armadillo University of Sheffield	Armadillo uses multiple strategies, including IE using GATE components, to model a domain by connecting various entities and components, and to build an RDF ontology and knowledge base.
LIRICS EU Funded eContent Project (Jan 2005 - Jun 2007) led by INRIA-LORIA, France	LIRICS aims to provide ISO-ratified standards for language technology, with an open-source implementation platform and services, to enable the exchange and reuse of multilingual language resources in response to the needs of today's multilingual information and communication society.
TEXTvre JISC sub-award (April 2009 to March 2011) led by King's College, London
CLARIN EU Funded FP7 Project (Jan 2008 to Dec 2010) led by the University of Utrecht, Netherlands	CLARIN's mission is to create an infrastructure which makes language resources (annotated recordings, texts, lexica, ontologies) and technology (speech recognizers, lemmatizers, parsers, summarizers, information extractors) available and readily usable to scholars of all disciplines, in particular the humanities and social sciences (HSS).
*** Digital Libraries: Corpus annotation and processing**
ForgetIT EU-funded Integrated Project (Feb.2013 - Jan.2016)	Concise Preservation by combining Managed Forgetting and Contextualized Remembering
ARCOMEM EU-funded Integrated Project (Jan.2011 - Dec.2013), led by University of Sheffield	From archives to community memories.
uComp EPSRC-funded CHIST-ERA project (Nov.2012 - Nov.2015), led by University of Sheffield	Embedded Human Computation for Knowledge Extraction and Evaluation.
The National Archives Ontotext, University of Sheffield, System Simulation Limited	Bringing semantic annotation to the UK government's web archive.
EnviLOD A JISC-funded project with the British Library and HR Wallingford.	Semantic annotation and search with Linked Open Data, in the domain of environmental science.
GATE/ETCSL University of Sheffield, University of Oxford	The project is building generic tools for linguistic annotation and Web based analysis of literary Sumerian.
EMILLE University of Lancaster & University of Sheffield	Building a 63 million word electronic corpus of South Asian languages, especially those spoken in the UK.
OldBaileyIE University of Sheffield	Named entity recognition on 17th century Old Bailey Court reports.
*** Digital Libraries: Multimedia**
PrestoSpace EU-funded Integrated Project (Feb.2004 - Jun.2007), led by Institut national de l?audiovisuel (INA), France	The project's objective is to provide technical solutions and integrated systems for a complete digital preservation of all kinds of audio-visual collections. Our role is to develop language technology methods for (semi-)automatic creation of metadata from multimedia content.
MUMIS CTIT (Netherlands), University of Sheffield, University of Nijmegen (Netherlands), DFKI (Germany), Max Planck InstitutfürPsycholinguistik (Germany), ESTEAM (Sweden), VDA (Netherlands)	Automatic creation of indexes into multimedia programme material, using data from several sources and several languages, in the domain of football.
SOCIS University of Sheffield, University of Surrey	Integration of knowledge acquisition, information extraction, image processing and speech recognition technologies in the domain of police crime reports.
HSE University of Sheffield	Summarisation of information from company reports to generate statistics about the level of compliance with Health and Safety recommendations and legislation.
*** E-science, Grid and Cloud Computing**
Annomarket EU Funded Small or medium scale focused research project (STREP)	A collaborative project funded by the EC that aims to deliver an affordable, open marketplace for pay-as-you-go, cloud-based extraction resources and services, in multiple languages.
GATE Cloud Exploratory University of Sheffield	A small exploratory project funded by JISC and the EPSRC to experiment with various aspects of cloud computing.
Khresmoi EU-funded Integrated Project (Sep.2010 - Aug.2014), led by University of Applied Sciences of Western Switzerland	A knowledge-helper for health information.
MultiFlora University of Manchester, University of Sheffield	An e-science bioinformatics project for biodiversity support.
MiAKT University of Sheffield, University of Southampton, Open University (KMI), University of Oxford, Guy's Hospital, King's College London	Collaborative problem solving environments in Medical Informatics, using knowledge services provided by the e-Science grid infrastructure.
myGRID University of Manchester, University of Newcastle, University of Nottingham, University of Sheffield, University of Southampton, IT Innovation Centre, European Bioinformatics Institute	Extending the GRID framework of distributed conputing by producing a virtual laboratory bench that will support the life sciences community and make use of complex distributed resources.
CLEF University of Manchester, CHIME/University College London, University of Brighton, University of Sheffield, CambridgeUniversity Health	Building on E-Science technology to embed a full information cycle within practical clinical systems, building tools to integrate patient information from text and images, and linking clinical and genomic research.
*** Semantic Web and Knowledge Technologies**
uComp EPSRC-funded CHIST-ERA project (Nov.2012 - Nov.2015), led by University of Sheffield	Embedded Human Computation for Knowledge Extraction and Evaluation.
DecarboNet EC-funded project (Oct.2013 - Sep.2016), led by the Knowledge Media Institute	We are building a decarbonisation platform for translating collective awareness of climate change into behavioural change. Our role is in mining social media: named entities, events, opinions, and controversies, using Linked Open Data.
AKT University of Aberdeen, University of Edinburgh, Open University, University of Sheffield, University of Southampton	Builds new knowledge acquisition, retrieval, and publishing tools based on Language Engineering and using GATE.
SEKT EU-funded Integrated Project (Jan.2004 - Dec.2006), led by BT	The vision of SEKT is to develop and exploit the knowledge technologies which underlie Next Generation Knowledge Management. The SEKT strategy is built around the synergy of the complementary know-how in Ontology and Metadata Technology, Knowledge Discovery and Human Language Technology.
KnowledgeWeb EU-funded Network of Excellence (Jan.2004 - Dec.2007), led by University of Innsbruck	The transition of the Semantic Web from an academic adventure into a technology provided by software industry is still a long way ahead. The main goal of KnowledgeWeb is to support this process. Our role is in providing expertise on the role of Human Language Technology in ontology-based applications.
h-TechSight University of Surrey, University of Sheffield, Athens Technology Center, University of Innsbruck, UniversitatRovira I Virgili	An IST project developing a knowledge management platform with intelligence and insight capabilities for technology intensive industries. Sheffield provides the platform with a targeted search module to analyse the content of webpages and track interesting instances of concepts over time.
ArtEquAkt University of Southampton, University of Sheffield	This informal project, which is part of AKT, produces composite descriptions of cultural artefacts and figures (e.g. Rembrandt) from diverse web pages, uses GATE-based Natural Language Generation system. ArtEquAkt is a collaboration between the Equator wearable computing project and the AKT Knowledge Technologies project.
DotKom University of Sheffield, University of Karlsruhe (Germany), Open University, Ontoprise (Germany), Quinary (Italy), ITC-IRST (Italy)	This project aims at defining tools and methodologies for IE-based Knowledge Management, focusing on adaptive IE using Machine Learning, and will be developed using GATE components.
Adaptiva University of Sheffield	An ontology building environment incorporating adaptive IE using GATE components.
Amilcare University of Sheffield	An ontology building environment incorporating adaptive IE using GATE components.
TAO EU-funded Specific Targeted Research Project (Mar 2006 - Feb 2009), led by University of Sheffield, UK	The project's goal is to show how existing 'legacy' applications can migrate to open, semantic-based Service-Oriented Architectures at acceptable development cost. Sheffield's main role will be on automatic methods for content augmentation and integration.
NeOn EU-funded Integrated Project (Apr 2006 - Feb 2010), led by The Open University, UK	NeOn aims to provide a considerable improvement in the level of support available for ontology engineering by developing both a reference architecture and a concrete toolkit supporting the whole ontology engineering life cycle. The University of Sheffield will develop an open-source demonstrator of collaborative semantic annotation with networked ontologies.
LarKC EU-funded Integrated Project (Apr 2008 - Sept 2011), led by University of Innsbruck	Current Semantic Web reasoning systems do not scale to the requirements of their hottest applications, such as dealing with terabytes of scientific data. LarKC aims to remove these scalability barriers by using massive distributed incomplete reasoning. The University of Sheffield's main roles are to provide methods for retrieval and selection of data for reasoning, and to demonstrate the platform in support of carcinogenesis research.
Media-Campaign EU-funded Specific Targeted Research Project (Mar 2006 - Feb 2009), led by Joanneum Research, Austria	The project's main goal is to automate to a large degree the detection and tracking of media campaigns on television, Internet and in the press. MediaCampaign's scope is on discovering, inter-relating and navigating cross-media campaign knowledge. University of Sheffield would be involved in text analysis (press, Internet, speech transcript), product knowledge interlinking, unification of two or more (partial) descriptions of an instance and knowledge fusion for campaign discovery.
MUSING EU-funded Integrated Project (2006 - 2010) led by Metaware S.p.A., Italy	MUSING will integrate Semantic Web and Human Language technologies and combine declarative rule-based methods and statistical approaches for enhancing the technological foundations of knowledge acquisition and reasoning in BI applications.
Service-Finder EU-funded Project (2008 - 2009) led by CEFRIEL, Italy	Service-Finder aims to develop a platform for service discovery in which Web Services are embedded in a Web 2.0 environment.

[Back to top...]

Projects Outside Sheffield

[Semantic Web | Digital Libraries / Cultural Heritage | E-science/bio-informatics | Language Technology | Other Applications]

*** Knowledge Management and Semantic Web**
KIM Ontotext , Bulgaria	ANNIE-powered Semantic Web annotation as part of their Knowledge and Information Management (KIM) platform.
MeManage Knallgrau New Media Solutions GmbH, Germany	The goal of MeManage is to relate personal Information on your Computer and computers of your peers to one another. Some ideas from the Semantic Web combined with the ease and simplicity of Weblogs and Wikis, plus a little Social Software.
AquaLog UK	The availability of semantic markup on the web opens the way to novel, sophisticated forms of question answering. While semantic information can be used in several different ways to improve question answering, an important consequence of the availability of semantic markup on the web is that this can indeed be queried directly. AquaLog is a portable question-answering system which takes queries expressed in natural language and an ontology as input and returns answers drawn from one or more knowledge bases (KBs), which instantiate the input ontology with domain-specific information. AquaLog present an elegant solution in which different strategies are combined together. It makes use of the GATE NLP platform as part of the linguistic process , string metrics algorithms , a learning mechanism as a solution to manage lexical resources, including domain-dependent lexica and generic resources such as WordNet. AquaLog also makes use of a novel ontology-based relation similarity service to make sense of user queries with respect to the target knowledge base. Contact email: v.lopez@open.ac.uk
Med Dictate Medwrite Inc, Anaheim, Canada	The automation of customized retrieval of medical transcriptions.
Engineering a Semantic Desktop for Building Historians and Architects Germany	We analyse the requirements for an advanced semantic support of users-building historians and architects-of a multi-volume encyclopedia of architecture from the late 19th century. Novel requirements include the integration of content retrieval, content development, and automated content analysis based on natural language processing. We present a system architecture for the detected requirements and its current implementation. A complex scenario demonstrates how a desktop supporting semantic analysis can contribute to specific, relevant user tasks. Email: witte@ipd.uka.de
*** Digital Libraries and Cultural Heritage**
Greenstone University of Waikato, New Zealand	Greenstone is a suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD-ROM. Greenstone is produced by the New Zealand Digital Library Project at the University of Waikato, and developed and distributed in cooperation with UNESCO and the Human Info NGO. It is open-source, multilingual software, issued under the terms of the GNU General Public License. Greenstone uses GATE and ANNIE to enhance digital collections by addition of metadata.
ECHO	The European Heritage On-Line (ECHO) project is developing a model for European culture on the web. The GATE team is represented on the technical board of ECHO and are working towards transfer of advanced text processing tools to help produce a new model of richly interlinked shared cultural materials.
Perseus Tufts University, Massachussetts, USA	The Perseus digital library, one of the largest and most advanced such projects in the world, uses GATE for corpus annotation and language processing.
*** E-science and bioinformatics**
Parallel IE Merck KGaA, Darmstadt, Germany	Information Extraction on a Linux cluster for bio-medical text mining and indexing.
Medical Informatics University of Pittsburgh, USA	Annotating surgical pathology reports using UMLS.
Medical Informatics Institute for Medical Informatics and Biometry, University of Rostock, Germany	Analysing MEDLINE abstracts to extract causal functional relations, which are essential for the construction of genetic networks, as a step towards characterisation of diseases.
BioRat University College, London, U.K.	BioRAT is a general-purpose information extraction tool, designed to be used by biologists to mine text from journals. It has been successfully applied to protein-protein interaction discovery, and projects are underway in several other areas. It uses GATE at its core, while also providing tools to design new templates; to edit gazetteers; and to download full-length papers from the web. The software is available for academic use, and is part of a research project, funded by the BBSRC.
InESBi Institute for Medical Informatics and Biometry, University of Rostock, Germany	The information extraction for structural biology project aims at the extraction of information from the 'material and method' part of the structural biology publications. The purpose of this is to fulfil a database. Some of the informations for the database are retrieved from structured files named PDB (see http://www.rcsb.org/pdb/). The material and method used for experiments are not in PDB files. So we intend extract that information from the text of the publications. Contact email: huault@igbmc.u-strasbg.fr.
Visualization of Consumer Health Information School of Information Systems & Technology, Claremont, USA	Research indicates that the text on many popular web sites is difficult to understand and consumers find that reading documents in electronic format is problematic. Since health information read online influences the patient-doctor relationship - e.g., treatments requested, or perceived patient value from a doctor's visit - it is important that this information be interpreted and remembered as completely and correctly as possible. Misunderstandings in health information may increase the risk of making unwise health decisions, which could lead to poorer health and higher health care costs. The goal of the project is to develop and test new technology that can present online health information that is easier to understand and remember. Prototypes will be developed that will visualize both the structure and content of web pages to increase understanding and retention without oversimplification. A small pilot study has shown positive effects of such a representation. The two prototypes will differ in how much content detail is included in the visualization. They will be evaluated for their effects on understanding and retention of information and compared with currently existing web sites. User behavior and preferences will also be captured and analyzed. Three user groups will participate in the development and evaluation of the prototypes: elderly consumers, Hispanic non-native speakers, and patients. These groups were chosen for their specific characteristics (age related problems, sub-optimal command of English, and patients' stress) that may require improved information presentation. Contact: Gondy Leroy
eHealth GATEway project University of Leeds, UK	Anonymisation of patient health records with GATE. JISC-funded project (Feb-Aug 2012). Project website. Contact: Owen Johnson
HiTEx project U.S. National Library of Medicine, National Institutes of Health, USA	Health Information Text Extraction (HiTEx) tool based on GATE - a modular system that assembles a different pipeline for extracting specific findings from clinical narratives. Funded by National Library of Medicine, National Institutes of Health, USA. Project website. Paper: What can Natural Language Processing do for Clinical Decision Support? Contact: Dina Demner-Fushman
*** Human Language Technology**
Pieces Evidence TRW Systems, Colorado Springs, USA	Converting text into pieces evidence as part of an R&D project for the US government.
IE Denso IT Laboratory, Japan	Development of IE and other language tools for in-car navigation systems and automobile-related speech and language technology.
Database technology Birkbeck College, London, U.K.	Using Information Extraction (IE) to enhance the support for text in database technology.
Enactable Models Middlesex University, U.K.	Building a summarisation system based on discourse structure.
Semantic Analysis Over Sparse Data	A John Hopkins 2003 summer workshop at the Centre for Language and Speech Processing on learning-based semantic annotation to reduce data sparseness in diverse corpora.
Summarisation Imperial College, London	Building a summarisation system entered in the Document Understanding Conference (DUC 2002) evaluation.
Named Entity recognition for Machine Translation University of Leeds, U.K.	Work on improving MT systems using NE recognition in GATE.
GROK/OpenNLP University of Edinburgh	Integration of GATE with the GROK/OpenNLP project - a library of NLP components including support for parsing and various pre-processing tasks.
Mission Abstraction	To develop a tool that can produce the synopsis of the text document given to it.
Internal R&D Linguit Ltd, Edinburgh, UK	At Linguit Ltd., we use GATE for internal research purposes because it allows us to explore new ideas in the area of information extraction rapidly.
mpp Madan Puraskar Pustakalaya, Nepal	Nepali language localization. Contact name: Laxmi Pd Khatiwada.
Building An Information Extraction System	Objective is to extract the required information IN STRUCTURED FORM from the unstructured text and there are 4 tasks to accomplish the main task. They are named entity recognition, coreference, template element task, and scenario task.
*** Other Applications**
University of Georgia	Analysis of case records of child protective service workers.
Email Summary CI Secure, Toronto, Canada	We are developing an e-mail summarization program service. Basically, we like to capture the essence of an e-mail message and present it as 1-2 line summary. This will be integrated with our current mail service, and will help our client save time. We a looking for partners, who are experienced with Gate to co-develop this technology. Contact email: gate@ci-secure.com
Sindbad - Knowledge Generation NetBreeze GmbH, Dübendorf, Switzerland	The generation of industry-specific knowledge from openly accessible internet-sources for direct integration into business processes.
Leads Generator by using IE	Crawl web pages, read from rss news and so on. Extract info of corporation about contacts, industry events, etc.
OntosMiner	Aims at combining AI & IT experience within the NLP domain and R&D in knowledge management. In addition, to develop and implement special environment Ontos WorkGroup with functionality of communication with OntosMiner server, viewing and editing received Cognitive Maps and, in general, to support the technology of ontology-driven content extraction for several domains.
Multi-Lingual Noun Phrase Extractor (NPE) The Universität Karlsruhe, Germany	JAPE-based. Currently supported languages are English, German, and French. It requires a part-of-speech tagger to work (it has been tested with the Hepple tagger for English and the TreeTagger for German and French. One particular feature is that it can use previously detected named entities (like the Person, Organization, ... found by ANNIE) to improve chunking performance.
SP2A Università degli Studi di Parma, Italy	SP2A is a thin framework enabling peer-to-peer based Grids. In practice, SP2A is a lightweight Java package allowing the development of service-oriented peers (SOPs). These SOPs can be used to form unstructured supernode networks (USNs), and exchange information about the services they host using P2P message routing algorithms. Each SOP is allowed to explore local services, to publish service advertisements remotely, and to search for remote services. Contact name: Michele Amoretti .
Autovita	Mining a vita from the web. Contact: vamshi@andrew.cmu.edu