Semantic Annotation of Social Media: A Hands-on Tutorial

Tutorial to be presented at ISWC 2014, Monday 20 October, 2014, Riva del Garda, Italy


This tutorial takes a detailed view of key semantic annotation tasks (corpus annotation, linguistic pre-processing, entity and relation recognition, LOD-based entity linking and disambiguation, and opinion mining) of social media content. It will cover both the latest state-of-the-art research and selected established methods and tools for natural language processing (NLP), which have been adapted to social media. Topics covered include the comparison of traditional news-based text with social media in terms of processing techniques, an introduction to semantic annotation, description of algorithms tailored to social media, crowdsourcing approaches to collecting data for annotation and evaluation, and a look at real applications, including summarisation of social media content, user modelling, media monitoring and semantics-based information visualisation. The tutorial will be interspersed with practical exercises for the participants to carry out, using the GATE toolkit.


The tutorial aims to:


You do not need any prior knowledge of GATE, Java or NLP to attend this tutorial. However, it includes a significant hands-on element where you will be able to try things out in GATE. We therefore recommend that you bring a laptop to the tutorial, and that you download and install GATE prior to the tutorial. If possible, you could also familiarise yourself with the basic GUI, though explicit instructions will be given for the various tasks. GATE can be downloaded from http://gate.ac.uk/download. Please ensure that you have the latest version (8.0). You will also require Java 7 or 8. Please contact us before the tutorial if you have any issues with this. A good starting point for familiarising yourself with the GUI can be found in our introductory GATE tutorial here.

Outline of the tutorial

After a short introduction to the challenges of processing social media, we will cover key semantic annotation algorithms adapted to processing such content, discuss available evaluation datasets and outline remaining challenges. Since the lack of human-annotated, gold-standard corpora of social media content is another major challenge, this tutorial will cover also crowdsourcing approaches used to collect training and evaluation data (including paid-for crowdsourcing with CrowdFlower, also combined with expert-sourcing and games with a purpose).

Each main section of the tutorial will contain practical exercises for the participants to try out examples for themselves and see the results: for example, experimenting with different methods, tools and resources for the same task to see how the results differ. A copy of hands-on materials will be given to attendees on the day. The tutorial will be divided into 4 main sections as detailed below.


Time Section Presenter
09:15 - 09:20 Introduction Kalina Bontcheva
09:20 - 10:30 Challenges of Social Media Annotation Kalina Bontcheva, Diana Maynard
10:30 - 11:00 Coffee break
11:00 -12:30 Social media analysis tools Diana Maynard
12:30 - 14:00 Lunch
14:00 - 15:30 Crowdsourcing the creation of training and evaluation corpora Marta Sabou, Kalina Bontcheva
15:30 - 16:00 Coffee break
16:00 - 17:00 Semantic annotation of social media Kalina Bontcheva
17:00 - 17:30 Applications and wrap up discussion All

Audience profile

The target audience comprises researchers in the areas of Semantic Web who wish to learn more about analysing social media and use of NLP tools and techniques. The tutorial will be of particular interest to researchers who wish to make some use of semantic annotation, for connecting ontologies to any textual data. While the focus is on social media, the methods are largely generic and therefore also relevant to other kinds of text. Furthermore, the basis of the crowdsourcing methods can be used to solve a wide variety of tasks beyond NLP, e.g., the acquisition of ontological knowledge. Some knowledge of ontologies and related notions will be beneficial, though not essential. No previous knowledge of semantic annotation or NLP is required. The GATE toolkit will be used for practical exercises - no previous experience with GATE is necessary, but participants will be required to download and install it on their laptops prior to the tutorial.

Tutorial speakers

Dr. Kalina Bontcheva is a senior research scientist and the holder of an EPSRC career acceleration fellowship, working on text summarisation of social media. Kalina received her PhD on the topic of adaptive hypertext generation from the University of Sheffield in 2001. Her main interests are information extraction, opinion mining, natural language generation, text summarisation, and software infrastructures for NLP. She has been a leading developer of GATE since 1999. Kalina Bontcheva coordinated the EC-funded TAO STREP project on transitioning applications to ontologies, as well as leading the Sheffield teams in TrendMiner, MUSING, SEKT, and MI-AKT projects. She was an area co-chair for Information Extraction at ACL2010 and demos co-chair at COLING2008. Kalina is a demo co-chair of the forthcoming ACL2014. She also coorganises and lectures at the week-long, annual GATE NLP summer school in Sheffield, which attracts over 50 participants each year. Additionally, she has gathered extensive tutoring experience while (co-)organising the following tutorials: Tutorial on Natural Language Processing for Social Media at EACL 2014; Tutorials on Semantic Annotation for Knowledge Management and the Semantic Web at the European Semantic Web Conference (ESWC) - 2004, 2006, 2008; Tutorial on Named Entity Recognition at RANLP 2003.

Dr. Diana Maynard is a Research Fellow at the University of Sheffield, UK. She received a PhD from Manchester Metropolitan University in 2000 on the topic of automatic term extraction, and since then has been leading the linguistic development of Sheffield's open-source multilingual Information Extraction tools. Her main interests are in Information Extraction, opinion mining, social media and Semantic Web technology. She is involved in the development of the social media analysis tools in GATE, has developed a number of opinion mining -related tools, and has both published widely and given a number of invited talks and tutorials on these topics. She is co-chair of the annual GATE training courses, teaching modules on Advanced Information Extraction, Semantic Annotation, Opinion Mining and Social Media Analysis. She was co-chair of the ISWC Semantic Web Challenge from 2010-2012, has been area chair for NLP at numerous semantic web conferences, organised a number of national and international conferences, workshops and tutorials, given keynote speeches, invited talks, tutorials, lectures and courses on a number of NLP and Semantic Web-related topics. She has recently given tutorials at EKAW, LREC, STIL and the Sentiment Analysis Symposium, as well as a forthcoming tutorial at LREC 2014.

Dr. Marta Sabou is a Senior Lead Researcher at the Vienna University of Technology. Previously she was an Assistant Professor at the New Media Technology Department of the MODUL University Vienna, and a Research Fellow at Open University,UK. She holds a PhD from Vrije Universiteit Amsterdam in which she tackled ontology modelling issues for Web Services. During her PhD, she pioneered the use of NLP in the semantic web services area, implementing the proposed algorithms by using GATE. She won the IEEE Intelligent System's Ten to Watch Award (2006) for her PhD work and the Linked Data Cup at i-Semantics 2012. Dr. Sabou will cover the crowdsourcing aspects of the tutorial: she has a thorough knowledge of the use of Human Computation in the NLP and ontology engineering areas, as a result of participating in two research projects on this topic, and has published widely on the topic. She uses CrowdFlower extensively to solve a variety of tasks (e.g., named entity recognition, sentiment detection, ontology learning) and is involved in the design and evaluation of a Facebook-based game with a purpose and in building a crowdsourcing platform as part of the uComp project. In terms of teaching experience, Dr. Sabou has designed and taught the entire Informatics curricula at MODUL University, both for graduates and undergraduates, using a variety of teaching settings ranging from traditional lectures to seminars and two-day intensive courses.


This tutorial is supported by the following EU research projects: