Practical Social Media Analysis: finding utility in trivia

Tutorial to be presented at LREC 2014, Monday 26 May (morning session), Reykjavik, Iceland

Motivation

Social media is fast becoming a crucial part of our everyday lives, not only as a fun and practical way to share our interests and activities with geographically distributed networks of friends, but as an important part of our business lives also. On the one hand, it affords a highly effective means of promotion and advertising for companies, as well as market watch activities to keep an eye on competitors and collaborators, while on the other hand, it enables companies and institutions to acquire valuable feedback by analysing what their customers have to say. All kinds of predictions can be made based on such knowledge, from gauging current political opinions and predicting stock price movements relative to public mood, to the more frivolous but still highly lucrative predictions about Oscar winners and film revenues. Processing social media is particularly problematic for NLP tools, firstly because it is a strong departure from the tradition of newswire that many tools were developed with and evaluated against, and also due to the terse and low-context language it typically comprises. Effective opinion mining from social media is still very much a hot research topic rather than a solved problem.

This tutorial will address these needs by introducing some of the problems faced by using NLP tools on social media, and solutions to these problems. It will demonstrate techniques for extracting the relevant information from unstructured text in social media, so that participants will be equipped with the necessary building blocks of knowledge to build their own tools and tackle complex issues. The tutorial will cover state-of-the-art research for important subtasks. Since all of the NLP tools to be presented are open source, the tutorial will provide the attendees with skills which are easy to apply and do not require special software or licenses.

Outline of the tutorial

The tutorial will be divided into 3 sections, as follows:

  1. Introduction to social media analysis The first part of the tutorial will explain the motivation for analysis of social media and for relevant tools, with examples showing how it can be used in many different situations. It will describe the main tasks and the general problems to be overcome.
  2. Linguistic processing of social media The second part of the tutorial introduces the peculiarities of social media specific linguistic phenomena: hashtags; mentions; sharing (e.g. retweets); content attachment (images, URLs); and stylistic spelling and punctuation usage. We will introduce and demonstrate tools for social media processing, comparing and contrasting domain-adapted and conventional tools in typical NLP tasks - language recognition, tokenisation, POS tagging, and named entity recognition. This will be accompanied by hands-on exercises using real-world data. We will also discuss normalisation and various approaches to the task, and give a demonstration dataset and tool for this orthographic ``repair'' process. This part concludes with a discussion of how social media text relies on context. We will provide worked examples showing the structural differences between texts on social media and the conventional newswire settings. We will also examine how discussion moves between different settings and networks, as part of human information networks, and position the role of social networks in the general discourse methods available, enumerating potential sources of context and methods for exploiting them.
  3. Opinion mining for social media The final part of the tutorial will examine the difficulties presented by social media for the task of opinion mining, covering issues such as short, context-free utterances in tweets, the importance of context in detecting the polarity of sentiment words, use of slang, detection of sarcasm, and analysis of hashtags. It will present real-life example applications based on GATE, and investigate the appropriateness of different solutions, showing participants how to build up their own opinion mining tool step-by-step.

Schedule

The tutorial will run from 9am-1.30pm. There will be a coffee break around 11am.

Audience profile

The target audience will consist of researchers from any background looking to perform analysis of social media. No particular skills or knowledge are necessary, but an understanding of basic natural language processing concepts and techniques is useful, as is a general familiarity with Twitter, Facebook and social media.

Tutorial speakers

Dr Diana Maynard is a Research Fellow at the University of Sheffield, UK. She has a PhD in Automatic Term Recognition from Manchester Metropolitan University, and has been involved in research in NLP since 1994. Her main interests are in Information Extraction, opinion mining, social media and Semantic Web technology. Since 2000 she has led the development of USFD;s opensource multilingual Information Extraction tools, and has led research teams on a number of UK and EU projects. She is chair of the annual GATE training courses, teaches modules on Advanced Information Extraction, opinion mining and social media analysis, and leads the GATE consultancy on Information Extraction and opinion mining. She has published extensively, organised a number of national and international conferences, workshops and tutorials, given keynote speeches, invited talks, tutorials, lectures and courses on a number of NLP topics, including information extaction, opinion mining and social media analysis at international NLP and Semantic Web conferences.

Dr. Leon Derczynski is a Research Associate in the Natural Language Processing Group at the University of Sheffield, where he has worked for five years. He holds a PhD in Computational Linguistics from the University of Manchester, and has worked in computational linguistics and natural language processing for 10 years. He has published scientific papers and reviewed conference and journal papers, and regularly teaches at GATE courses and tutorials. He has worked on the implementation of linguistic annotation standards and the development of NLP applications for information extraction, opinion mining and the semantic web, particularly in national and international research projects.

Sponsorship

This tutorial is supported by the following EU research projects:

TrendMiner

PHEME

DecarboNet