Generic Sentiment Analysis Tool

1. Introduction

The generic opinion mining application is designed to perform generic opinion mining on texts that have been pre-processed with entities, terms or events (or all 3). It cannot be run stand-alone. You can use it with e.g. ANNIE, TwitIE, TermRaider, or your own application, which must be run first to generate the necessary annotations.

Note: the application copies all annotations from the Default annotation set into a new annotation set "Sentiment". This means that any previous annotations created must be in the Default annotation set. To change this, just edit the parameters of the Annotation Transfer PR at the beginning of the app.

2. What the App Does

It annotates sentiment over a single sentence and also over a document or other collection of sentences (see the section on Sentiment Averaging). It also annotates opinion authors and targets where present, and annotates emotions as well as positive/negative sentiment.

3. Sentiment Targets

Sentiments are typically created based on entities, terms or events as the targets. The way a sentiment-target combination is found can be changed in the sentiment_target JAPE rule (in the opinion mining grammar PR). Possible opinion targets are annotated as CandidateTarget. By default, an Annotation Set Transfer PR called "configure target type" lets you convert existing annotations from a previous app (e.g. Terms, or Person/Location/Organization annotations from ANNIE or TwitIE) into CandidateTarget annotations. Any annotation type(s) can therefore be used as possible targets (though this set is then refined later).

Note that there is also a grammar file called sentiment_target_general.jape, which incorporates some extra high-precision rules to the standard opinion mining rules. However, these use standard NPs as targets, and supercede the main more generic patterns (found in sentiment_target.jape). The rules include constructions like "NP sentiment-verb NP" and "NP verb Sentiment" (e.g. "John likes porcupines" would produce a positive sentiment with "porcupines" as the target. If this is not desired, these rules can be turned off by simply commenting out or removing from the opinion mining grammar the JAPE rule file sentimenttargetgeneral.jape. However, they have been found to be quite successful in experiments.

4. Emotions

Basic emotions are also detected, based on gazetteer lists which have been expanded using a variety of methods (see DecarboNet deliverable D2.3.2 from https://www.decarbonet.eu/deliverables/). These emotions are: anger, disgust, fear, good, joy, sadness, surprise, cute, bad. Good and bad are basically catch-all categories for anything not represented in the other categories. Note that if an opinion is negated (e.g. "not" appears with an emotion word), the emotion category is changed from the original. See the grammar file resources/grammar/sentiment/emotions_negative,jape for an explanation (and to change any of this behaviour).

5. Opinion Holders

Opinion holders (annotated as "Holder", and also as a feature opinion_holder on the SentenceSentiment annotations) are identified where explicitly stated (e.g. "John says he likes porcupines." If no explicit opinion holder is mentioned, it is assumed to be the author of the text. In the case where the text is a tweet and we have the information about the author from the json file, this information (the userID) is representd in the annotation.

6. Sentiment Averaging

The Sentiment Averaging component averages the sentiment scores from the individual sentences over a collection (usually a document). It takes as input SentenceSentiment annotations (the ones to combine), and creates a new annotation SentenceSet for the document or sentence collection. These annotation types are both runtime parameters on the groovy script SentimentAveraging, and can be configured at will (innerType and outerType respectively).

7. Linguistic Types

Finally, as an added bonus, it annotates some linguistic types such as conditionals, imperatives, questions and so on. Some of these are used in the opinion mining stage; others are thrown in since we often want to annotate these things when doing deeper analysis. The description of the linguistic types can be found in the DecarboNet deliverable D2.2.2 from https://www.decarbonet.eu/deliverables/