The Large Knowledge Collider

Summary

Current Semantic Web reasoning systems do not scale to the requirements of their hottest applications, such as analysing data from millions of mobile devices, dealing with terabytes of scientific data, and content management in enterprises with thousands of knowledge workers.

The Large Knowledge Collider (LarKC, for short, pronounced "lark"), a platform for massive distributed incomplete reasoning, will remove these scalability barriers. LarKC will achieve this by:

Enriching the current logic-based Semantic Web reasoning methods with methods from information retrieval, machine learning, information theory, databases, and probabilistic reasoning.
Employing cognitively inspired approaches and techniques such as spreading activation, focus of attention, reinforcement, habituation, relevance reasoning, and bounded rationality.
Building a distributed reasoning platform and realizing it both on a high-performance computing cluster and via "computing at home"

The Large Knowledge Collider will be an open architecture. Researchers and practicioners from outside the consortium will be encouraged to plug in their own components to drive parts of the system. This will make the Large Knowledge Collider a generic platform, and not just a single reasoning engine.

The success of the Large Knowledge Collider will be demonstrated in three end-user case studies. The first case study is from the telecom sector. It aims at real-time aggregation and analysis of location data obtained from mobile phones carried by the population of a city, in order to regulate city infrastructure functions such as public transport. The other two case studies are in the life-sciences domain, related respectively to drug discovery and carcinogenesis research.

LarKC is an EU Large-Scale Integrating Project, funded under Framework Programme 7.

Contact: Hamish Cunningham (PI)

LarKC and librarianship.

Project Objectives:

The LarKC major objectives are:

Design an integrated pluggable platform for large-scale semantic computing
Construct a reference implementation for such an integrated platform for large-scale semantic computing, including a fully functional set of baseline plugins.
Achieve sufficient conceptual integration between approaches of heterogeneous fields (logical inference, databases, machine learning, cognitive science) to enable the seamless integration of components based on methods from these diverse fields.
Demonstrate the effectiveness of the reference implementation through applications in (1) services based on data-aggregation from mobile-phone users, (2) meta-analysis of scientific literature in cancer research, (3) data-integration and -analysis in early clinical development. the drug-discovery pipeline.

The Large Knowledge Collider will be able to perform inference (RDF Schema and OWL Lite) on data-sets of tens of billions of triples in real-time response times. The Large Knowledge Collider will be implemented on a computing cluster of at least 100 processors and will achieve at least 80% platform utilization. Multiple plug-in methods will be available for each of the pluggable components. Three demonstrated deployments of the Large Knowledge Collider will be available by the end of the project, as listed above.

LarKC's objectives are aimed at problems in the semantic foundations of web reasoning:

The Large Knowledge Collider will employ probabilistic techniques for selection (e.g. spreading activation techniques), for abstraction (machine learning techniques), and for reasoning (e.g. weighted inference models).
Approximate and incomplete reasoning is one of the main objectives for research to be performed on the Large Knowledge Collider platform.
The Large Knowledge Collider will move well beyond existing formalisms such as RDF, RDF Schema and OWL, without ignoring these achievements. Instead, the Large Knowledge Collider will enable approximate inference on top of these existing formalisms.
The Large Knowledge Collider will be a reference implementation that can be used as a pluggable experimental platform by other researchers.
The use cases in the project deal with large scale ontology mediated Web integration of heterogeneous, evolving and noisy or inconsistent data sources.
The Real Time City use case is entirely based on data obtained from ambient devices (mobile phones) while navigating through a city.
The Real Time City use case will require real-time response rates.
The other use cases are concerned with scientific data and literature.

Our Role

The University of Sheffield has two main roles in LarKC.

Retrieval and Selection. Our first role is to construct methods for retrieving and selecting propositions contained in large-scale semantic repositories. Retrieval and selection are needed to support ceiling-free reasoning: we need to be able to dynamically reduce or expand the data set we are working with depending on factors such as cost of processing or confidence in result. To do this we will exploit methods from information retrieval, machine learning and cognitive science. In most cases the relevant methods will need adaptation to the new context and will apply in some cases of the selection task and not others.

Carcinogenesis research tools. Our second role is to demonstrate the use of the LarKC platform to support carcinogenesis research. Specifically, document analysis and ontology-based knowledge management will be used to support researchers producing standard references on the human risk factors in cancer. In addition, we will provide computational support for the identification of complex disease genes, using multi-locus SNP studies.

Funding:

Project Web page: http://www.larkc.eu/
Project Reference: FP7-215535
Project Acronym: LarKC
Project Name: The Large Knowledge Collider
Key Action: Intelligent content and semantics
Action line: ICT-2007.4.2
Total cost: 10.11 million Euros
Commission Funding: 7.25 million Euros
Project Duration: 2008-04-01 to 2011-09-30