How does word2vec handle the input word being in the context?

7

If word2vec encounters the same word multiple times in the same window, what occurs? Obviously it is meaningless to decrease the distance between the vectors for the input word and the target word. But will the repetition strengthen the relationship between the repeated word and the context words?

jamesmf

Posted 2015-09-17T21:02:33.367

Reputation: 2 927

Answers

2

We can look at the source for guidance.

How does word2vec handle the input word being in the context?

It is skipped; for both the skip-gram and CBOW models.

If word2vec encounters the same word multiple times in the same window, what occurs?

The relationship is strengthened.

Emre

Posted 2015-09-17T21:02:33.367

Reputation: 9 953

1

I think your last question is worth discussing, but forgive my careless on skipping the details of the model and just leaving a quick answer here :P

Repeating a sentence in your corpus would definitely change the learning result, and strength the relationship of the words in this sentence, because one of the models behind word2vec is skip-gram, which assume the center word can be used to predict its surroundings.

But I have to ask another question coming follows: what is our purpose of using word2vec?

  1. To find similar words in semantic and synthetic, which is used to search and information retrieval.
  2. A skip-gram model is useful for modeling those like click-sequence data, which could be used in recommendation

zihaolucky

Posted 2015-09-17T21:02:33.367

Reputation: 141

I believe you meant "syntactic" not "synthetic. – Vladislavs Dovgalecs – 2016-02-23T16:52:40.447

I can see how repeating the sentence might do that, but my question is specifically about repeating one of the words in the sentence. – jamesmf – 2015-09-24T20:02:54.110