Need some advice on approach to select only the informative emojis from the data set?


I have a giant data set from a local elections, which contains hashtags, emojis, and comments. I wanted to make a network analysis using only emojis.

So far I have a network analysis graph made in R which looks like this: enter image description here

Sorry, you may have to zoom in to see the nodes. So, basically my goal is to see what people are talking about as a whole group. Currently there are lot of nodes which don't really say anything concrete about the main context or the hook, yet also creating clutter. I had extracted the data with political hashtags. Therefore, nodes such as milk bottle, cows, joy faces, toffees, flags, etc. doesn't really give me anything between people's comments and the context. My goal is to see a collective sentiment of people via their use of emojis in their context. I don't know if I make sense.

I don't know how should I approach the problem of only selecting informative emojis. Should I focus on the hashtags, and make a list of hashtags I am interested in, and only extract their associated comments and emojis? Or should I look at the sentimental values associated with emojis, and focus on the extreme positive and negative ones only?

I am pretty lost, any direction which method/ algorithm should I use to de-clutter this graph a bit yet also keep it in the context of politics & elections?


Posted 2020-01-09T17:10:15.990

Reputation: 113



One option is to reframe it as a word embedding problem. Emojis can be embedded in a vector space along with comments and hashtags. Then distance measures and clustering can be used to find the emojis that are associated with different sentiments.

Brian Spiering

Posted 2020-01-09T17:10:15.990

Reputation: 10 864

Hello, apologies for a late response. Nice idea sir, we made our own emoji dictionary with sentiments. My mentor had consulted with an outsider linguistic professor to keep bias low. Appreciate your response! – UltaPitt – 2020-07-01T21:43:31.020