COMRADES

COMRADES project website
Twitter @comradesproject

Summary

COMRADES (Collective Platform for Community Resilience and Social Innovation during Crises) aims to empower communities with intelligent socio-technical solutions to help them reconnect, respond to, and recover from crisis situations.

We are creating an open‐source, community resilience platform, designed by communities, for communities, to help them reconnect, respond to, and recover from crisis situations.

COMRADES is researching how technologies can help communities to be more resistant to crisis situations and by providing a way to optimally share information enable them to proceed with needed help action in due time.

The aim of COMRADES is to build a next generation platform to:

Quickly filter the citizen reports as they arrive from social media and mobile texts
Remove uninformative and irrelevant ones
Point out unreliable sources
Pick up, and alert to several lone messages requesting urgent help during a crisis
Extract, group, and monitor unfolding emergency micro-events.

Contact: Kalina Bontcheva, Diana Maynard

Public Results

Within the project, we are building tools for linguistic analysis of social media messages in different languages, and tools for assessing informativeness and actionability of these messages. We release also some of the training data we have created.

Description of tools
- TwitIE tools - Named entity recognition from tweets for English, French and German
- YODIE tools - Entity linking and disambiguation from tweets for English, French and German

Try the tools on the GATE Cloud:
- TwitIE - named entity recognition from tweets
- French TwitIE - named entity recognition from French tweets
- German TwitIE - named entity recognition from German tweets
- YODIE - entity linking and disambiguation from English tweets
- French YODIE - entity linking and disambiguation from French tweets
- German YODIE - entity linking and disambiguation from German tweets

Datasets
- Informativeness dataset
  - Triage of information is very important when large, rapid streams of it are coming into a human-led crisis center. In order to learn how to automatically triage, this dataset contains tweets from crises and human annotations for how informative each tweet is. This can act as training and evaluation data for an automatic informativeness grading tool. In fact, we already have constructed one of these, with an accuracy of ~92% and F1 of 0.91.
  - The license is CC-NC-BY
  - It categorises crises tweets into three levels of informativeness
  - It's been crowdsourced with multiple coders per document
  - There are 747 labeled documents
  - download CSV
- Emerging entities dataset
  - Detecting names of locations, organisations and so automatically can be done well for common cases, but is currently very tough for new cases. Unfortunately, the names mentioned in crises are typically new, having not been mentioned in prior datasets, and so automatic systems don't pick them up well. This dataset focuses on these entities, and is part of a shared challenge to build systems that tackle this crisis-critical part of the named entity recognition task.
  - Data is drawn from Twitter, Reddit, YouTube and StackExchange
  - Twitter data covers two 2017 crises: Palm Sunday shootings and the Rigopiano avalanche
  - License is CC-NC-BY
  - There are 2300 labeled documents
  - Download from the emerging entities task page

Partners

KMI The Open University, UK
The University of Sheffield, UK
University of Agder, Norway
iHub, Kenya
Gov2U, Belgium

Key Personnel in Sheffield

Project publications from the Sheffield team

Prashant Khare, Gregoire Burel, Diana Maynard and Harith Alani. Cross-Lingual Classification of Crisis Data. International Semantic Web Conference, October 8-12 2018, Monterey, California.

L. Derczynski, K. Meesters, K. Bontcheva, D. Maynard. Helping Crisis Responders Find the Informative Needle in the Tweet Haystack. In Proceedings of 15th International Conference on Information Systems for Crisis Response and Management (ISCRAM), 20-23 May 2018, Rochester, US. arxiv pre-print

A. Aker, J. Petrak, and F. Sabbah, An Extensible Multilingual Open Source Lemmatizer, in Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria, 2017, pp. 40–45. online version

Project funded by the European Commission within the H2020-ICT-2015 Program, No 687847.