Spatio-temporal Entity Extraction Tool
This page details a spatio-temporal entity annotation tool for social media text, created as part of the Pheme project. The tool takes CoNLL-formatted data as input and annotates it with either spatial or temporal entities. The standards used are ISO-TimeML for temporal and ISO-Space for spatial. The exact entities covered are: temporal - event, timex3; spatial - location, spatial_entity. The tool does not attempt to classify entities (e.g. to detemine event.class) - there are good systems for finding entity attributes. For example, for temporal expressions, we recommend TIMEN, HeidelTime and timenorm.
System are adapted for twitter text, which is one of the most challenging social media text types, udner the intuition that if we can work well one this text, the system should have less difficulty with other kinds of social media text. The models are adapted from newswire to twitter by mixing both types of training data, and by mixing genres during feature extraction by taking Brown clusters over a combined-type corpes. The clusters are available for download, as is the social media training data.
0.1. Downloads
Tool (includes models and feature extractors): stie.tar.bz2
Dataset: twitter-site.tar.bz2
Brown clusters: rcv1-gha.6464M-c6000.paths.bz2
For support and queries, contact Leon Derczynski (leon.d@sheffield.ac.uk)