uComp: Embedded Human Computation for Knowledge Extraction and Evaluation
uComp project website, @uCompEU
Summary
The rapid growth and fragmented character of social media and publicly available structured data challenges established approaches to knowledge extraction. Many algorithms fail when they encounter noisy, multilingual and contradictory input. Efforts to increase the reliability and scalability of these algorithms face a lack of suitable training data and gold standards. Given that humans excel at interpreting contradictory and context-dependent evidence, the uComp project will address the above mentioned shortcomings by merging collective human intelligence and automated knowledge extraction methods in a symbiotic fashion. The project will build upon the emerging field of Human Computation (HC) in the tradition of games with a purpose and crowdsourcing marketplaces. It will advance the field of Web Science by developing a scalable and generic HC framework for knowledge extraction and evaluation, delegating the most challenging tasks to large communities of users and continuously learning from their feedback to optimise automated methods as part of an iterative process. A major contribution is the proposed foundational research on Embedded Human Computation (EHC), which will advance and integrate the currently disjoint research fields of human and machine computation. EHC goes beyond mere data collection and embeds the HC paradigm into adaptive knowledge extraction workflows. An open evaluation campaign will validate the accuracy and scalability of EHC to acquire factual and affective knowledge. In addition to novel evaluation methods, uComp will also provide shared datasets and benchmark EHC against established knowledge processing frameworks.
While uComp methods will be generic and evaluated across domains, climate change was chosen as the main use case for its challenging nature, subject to fluctuating and often conflicting interpretations. Collaborating with international organisations such as the European Environment Agency (EEA), the Climate Program Office of the National Oceanic and Atmospheric Administration (NOAA) and the NASA Earth Observatory will increase impact, provide a rich stream of input data, attract and retain a critical mass of users, and promote the adoption of EHC among a wide range of stakeholders.
Contact: Kalina Bontcheva
Papers and Public Deliverables
Objectives
- Develop a generic, configurable and reusable HC framework. Using HC for knowledge extraction requires significant research effort from several scientific disciplines and poses challenges in terms of scalability, accuracy, and feasibility. uComp will break new ground by creating a reusable framework with an extensible set of knowledge acquisition tasks. This generic HC framework will include new empirical and formal methods for (i) HC process configuration; (ii) engagement, profiling and incentivisation of human contributors; (iii) reliability monitoring, cheating prevention and quality control.
- Address challenges of dealing with noisy data. HC-based approaches for knowledge extraction raise issues of quality control (inter-human agreement, monitoring the quality of acquired knowledge), aggregation of noisy input data (reconciliation, provenance) and learning from this data to optimise algorithms. Some of these issues are already addressed within knowledge acquisition and language processing infrastructures, as long as small groups of highly skilled human experts are involved. However, as we move towards large-scale HC processing, further experimentation is required on how to best acquire high quality resources from relatively un-skilled contributors, who have little training and are self-directed.
- Embed human computation into knowledge extraction workflows. uComp will study new empirical and formal methods for integrating HC tasks into complex workflows, a novel knowledge extraction approach that we term Embedded Human Computation (EHC). This approach will go beyond knowledge acquisition and support other key steps in the knowledge processing life cycle. Comprehensive methodological and algorithmic support will be achieved through embedding the HC framework into mature, open-source knowledge extraction tools, to be coupled with new pattern discovery methods. We will maximise impact by using appropriate knowledge-encoding standards, addressing ethical and legal issues, and leveraging the research communities of established open-source toolkits.
- Evaluate EHC performance. uComp will evaluate flexibility, generalisability, accuracy and scalability of EHC for acquiring factual and affective knowledge. A rigorous evaluation process based on objective and reproducible experiments will measure the quality of HC-created resources. Additionally, the efficiency of combining large-scale HC with automated methods will be compared against both human experts and automated knowledge extraction. We will also create a new method and accompanying shared datasets for quantitative black-box evaluation, where collective intelligence is incorporated into a computer-human symbiotic infrastructure to address the gold-standard bottleneck. An open evaluation campaign will be organised to benchmark uComp’s new methods against existing knowledge processing algorithms from the Web Science research community.
Partners
- The University of Sheffield
- MODUL University Vienna
- Vienna University of Economics and Business
- LIMSI-CNRS