Log in Help
Print
Homeprojects 〉 larkc
 

GATE

The Large Knowledge Collider

Summary

Current Semantic Web reasoning systems do not scale to the requirements of their hottest applications, such as analysing data from millions of mobile devices, dealing with terabytes of scientific data, and content management in enterprises with thousands of knowledge workers.

The Large Knowledge Collider (LarKC, for short, pronounced "lark"), a platform for massive distributed incomplete reasoning, will remove these scalability barriers. LarKC will achieve this by:

The Large Knowledge Collider will be an open architecture. Researchers and practicioners from outside the consortium will be encouraged to plug in their own components to drive parts of the system. This will make the Large Knowledge Collider a generic platform, and not just a single reasoning engine.

The success of the Large Knowledge Collider will be demonstrated in three end-user case studies. The first case study is from the telecom sector. It aims at real-time aggregation and analysis of location data obtained from mobile phones carried by the population of a city, in order to regulate city infrastructure functions such as public transport. The other two case studies are in the life-sciences domain, related respectively to drug discovery and carcinogenesis research.

LarKC is an EU Large-Scale Integrating Project, funded under Framework Programme 7.

Contact: Hamish Cunningham (PI)


LarKC and librarianship.

Project Objectives:

The LarKC major objectives are:

  1. Design an integrated pluggable platform for large-scale semantic computing
  2. Construct a reference implementation for such an integrated platform for large-scale semantic computing, including a fully functional set of baseline plugins.
  3. Achieve sufficient conceptual integration between approaches of heterogeneous fields (logical inference, databases, machine learning, cognitive science) to enable the seamless integration of components based on methods from these diverse fields.
  4. Demonstrate the effectiveness of the reference implementation through applications in (1) services based on data-aggregation from mobile-phone users, (2) meta-analysis of scientific literature in cancer research, (3) data-integration and -analysis in early clinical development. the drug-discovery pipeline.

The Large Knowledge Collider will be able to perform inference (RDF Schema and OWL Lite) on data-sets of tens of billions of triples in real-time response times. The Large Knowledge Collider will be implemented on a computing cluster of at least 100 processors and will achieve at least 80% platform utilization. Multiple plug-in methods will be available for each of the pluggable components. Three demonstrated deployments of the Large Knowledge Collider will be available by the end of the project, as listed above.

LarKC's objectives are aimed at problems in the semantic foundations of web reasoning:

Our Role

The University of Sheffield has two main roles in LarKC.

Retrieval and Selection. Our first role is to construct methods for retrieving and selecting propositions contained in large-scale semantic repositories. Retrieval and selection are needed to support ceiling-free reasoning: we need to be able to dynamically reduce or expand the data set we are working with depending on factors such as cost of processing or confidence in result. To do this we will exploit methods from information retrieval, machine learning and cognitive science. In most cases the relevant methods will need adaptation to the new context and will apply in some cases of the selection task and not others.

Carcinogenesis research tools. Our second role is to demonstrate the use of the LarKC platform to support carcinogenesis research. Specifically, document analysis and ontology-based knowledge management will be used to support researchers producing standard references on the human risk factors in cancer. In addition, we will provide computational support for the identification of complex disease genes, using multi-locus SNP studies.


Funding: