Sentiment data for Emoji

14

6

For experimenting we'd like to use the Emoji embedded in many Tweets as a ground truth/training data for simple quantitative senitment analysis. Tweets are usually too unstructured for NLP to work well.

Anyway, there are 722 Emoji in Unicode 6.0, and probably another 250 will be added in Unicode 7.0.

Is there a database (like e.g. SentiWordNet) that contains sentiment annotations for them?

(Note that SentiWordNet does allow for ambiguous meanings, too. Consider e.g. funny, which is not just positive: "this tastes funny" is probably not positive... same will hold for ;-) for example. But I don't think this is harder for Emoji than it is for regular words...)

Also, if you have experience with using them for sentiment analysis, I'd be interested to hear.

Erich Schubert

Posted 2014-08-12T07:57:03.283

Reputation: 341

Erich Schubert, i am looking for the exact same thing! Did you have any chance to find a useful resource for it? – saeed mehrabi – 2016-02-10T18:27:01.027

Don't believe that something like this exists currently, but would love it if you put something together for this! – indico – 2014-08-12T13:57:53.260

Answers

4

Total of 972 emoji is not really that big not to be able to label them manually, but I doubt that they will work as a good ground truth. Sources like Twitter are full of irony, sarcasm and other tricky settings where emotional symbols (such as emoji or emoticon) mean something different from normal interpretation. For example, someone may write "xxx cheated their clients, and now they are cheated themselves! ha ha ha! :D". This is definitely negative comment, but author is glad to see xxx company in trouble and thus adds positive emoticon. These cases are not that frequent, but definitely not suitable for ground truth.

Much more common approach is to use emoticon as a seed for collecting actual data set. For example, in this paper authors use emoticon and emotional hash tags to grab lexicon of words useful for further classification.

ffriend

Posted 2014-08-12T07:57:03.283

Reputation: 2 751

1Actually I disagree. Since the author likes them being in trouble, it is a positive sentiment there. It's a negative comment on the company, but nevertheless a positive sentiment by the author. In this simpler scenario (I'm not saying this is the complete goal), predicting which emojis a user would add to his post sounds like a reasonable task to me. In fact you can construct many cases where the emoji will be essential.. Consider "Got f_cked :-)" as opposed to "Got f_cked. :-(" – Erich Schubert – 2014-08-12T13:32:24.477

In case you try to estimate person's emotion as opposed to person's attitude to a subject, then yes, this example doesn't work. But there are many others. Sarcasm is common case. Consider sentence "oh yeah, you are real 'master' ;)". Human can catch negative context, but positive emoticon will point to positive emotion. But I haven't really got it: do you want to extract subjective information from tweets or just predict possible emojis? Even though they sound similar, second task is not really about sentiment analysis. Not directly, at least. – ffriend – 2014-08-12T14:10:36.103

The "wink" smiley is usually not considered to be "positive", but "ironic"... which is why a good dictionary such as SentiWordNet makes sense. If you look up funny in SentiWordNet, is has more than one meaning, too! http://sentiwordnet.isti.cnr.it/search.php?q=funny (So it is not trivial to annotate them manually, because it's not as simple as positive/negative; but you should do the usual interrater-agreement validation etc.)

– Erich Schubert – 2014-08-12T16:03:50.307

Now I see your idea. But I don't really think it will work, just because (most) emojis don't really sound like a good predictors to me, and you explicitly don't want to use other features. Anyway, this is just an opinion based on my experience, only data can give real answers. Good luck! – ffriend – 2014-08-12T21:39:06.460

Who said I don't want to use other features? But for these I have seen databases... – Erich Schubert – 2014-08-13T10:13:26.760

0

I found this Github repo useful (a good start). List of emoji rated for valence with an integer between minus five (negative) and plus five (positive).

See list of supported unicode-emojis.

Note that some emoji receive arguably confusing polarities, such as stuck_out_tongue_closed_eyes (0), due to being used for both positive and negative emotions.

Tal Weiss

Posted 2014-08-12T07:57:03.283

Reputation: 101