COMRADES (Collective Platform for Community Resilience and Social Innovation during Crises) aims to empower communities with intelligent socio-technical solutions to help them reconnect, respond to, and recover from crisis situations.
We are creating an open‐source, community resilience platform, designed by communities, for communities, to help them reconnect, respond to, and recover from crisis situations.
COMRADES is researching how technologies can help communities to be more resistant to crisis situations and by providing a way to optimally share information enable them to proceed with needed help action in due time.
The aim of COMRADES is to build a next generation platform to:
- Quickly filter the citizen reports as they arrive from social media and mobile texts
- Remove uninformative and irrelevant ones
- Point out unreliable sources
- Pick up, and alert to several lone messages requesting urgent help during a crisis
- Extract, group, and monitor unfolding emergency micro-events.
Within the project, we are building tools for linguistic analysis of social media messages in different languages, and tools for assessing informativeness and actionability of these messages. We release also some of the training data we have created.
- Description of tools
- Try the tools on the GATE Cloud:
- TwitIE - named entity recognition from tweets
- French TwitIE - named entity recognition from French tweets
- German TwitIE - named entity recognition from German tweets
- YODIE - entity linking and disambiguation from English tweets
- French YODIE - entity linking and disambiguation from French tweets
- German YODIE - entity linking and disambiguation from German tweets
- Informativeness dataset
- Triage of information is very important when large, rapid streams of it are coming into a human-led crisis center. In order to learn how to automatically triage, this dataset contains tweets from crises and human annotations for how informative each tweet is. This can act as training and evaluation data for an automatic informativeness grading tool. In fact, we already have constructed one of these, with an accuracy of ~92% and F1 of 0.91.
- The license is CC-NC-BY
- It categorises crises tweets into three levels of informativeness
- It's been crowdsourced with multiple coders per document
- There are 747 labeled documents
- download CSV
- Emerging entities dataset
- Detecting names of locations, organisations and so automatically can be done well for common cases, but is currently very tough for new cases. Unfortunately, the names mentioned in crises are typically new, having not been mentioned in prior datasets, and so automatic systems don't pick them up well. This dataset focuses on these entities, and is part of a shared challenge to build systems that tackle this crisis-critical part of the named entity recognition task.
- Data is drawn from Twitter, Reddit, YouTube and StackExchange
- Twitter data covers two 2017 crises: Palm Sunday shootings and the Rigopiano avalanche
- License is CC-NC-BY
- There are 2300 labeled documents
- Download from the emerging entities task page
- Informativeness dataset
- KMI The Open University, UK
- The University of Sheffield, UK
- University of Agder, Norway
- iHub, Kenya
- Gov2U, Belgium
Key Personnel in Sheffield
Project publications from the Sheffield team
- Prashant Khare, Gregoire Burel, Diana Maynard and Harith Alani. Cross-Lingual Classification of Crisis Data. International Semantic Web Conference, October 8-12 2018, Monterey, California.
- L. Derczynski, K. Meesters, K. Bontcheva, D. Maynard. Helping Crisis Responders Find the Informative Needle in the Tweet Haystack. In Proceedings of 15th International Conference on Information Systems for Crisis Response and Management (ISCRAM), 20-23 May 2018, Rochester, US. arxiv pre-print
- A. Aker, J. Petrak, and F. Sabbah, An Extensible Multilingual Open Source Lemmatizer, in Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria, 2017, pp. 40–45. online version
Project funded by the European Commission within the H2020-ICT-2015 Program, No 687847.