## How to handle negative words in word2vec?

7

6

I am training a big corpus using word2vec and averaging the word vectors to get sentence vectors. What is the best way to address negative words so that negative and positive sentences are away from each other? For e.g.: "After the fix code worked" and "After the fix code did not work" should ideally give sentence vectors which are far from each other. I heard one approach is to look for negative words like "not" and negate the next word vector. Can someone please clarify if that's a good approach or can suggest a better approach?

Don't average them; use a document vector model like paragraph2vec. See the sentiment analysis in the Experiments section for a performance evaluation.

– Emre – 2016-12-17T17:32:01.707

You can check this link. A way of handling negation is suggested (https://www.slideshare.net/shailendrakumars1/negation-handling).

3

When you look at the vectors that word2vec generates - negative words may have unique features but can be treated just like positive words. That is to say, as far as the NN is concerned - these are just similar words. You may have to construct "concept vectors" on top of the word vectors to do what you would like to do.

Your parts of speech tagging should automatically mark negating words as ADV. You can then train on these adverbs in conjunction to your verbs as a positive or negative output. Here's an example using spacy:-

import spacy

nlp = spacy.load('en')        # this can take a while
sample_text = u'Do not go.'
parsed_text = nlp(sample_text)
token_text = [token.orth_ for token in parsed_text]
token_pos = [token.pos_ for token in parsed_text]


At this point token_text will be a list of your words and token_pos will be the POS tagging:-

Do - VERB