The reply from Andrey Kutuzov via google groups felt satisfactory
I would say that word2vec algorithms are based on both.
When people say
distributional representation, they usually mean the
linguistic aspect: meaning is context, know the word by its company and
other famous quotes.
But when people say
distributed representation, it mostly doesn't have
anything to do with linguistics. It is more about computer science
aspect. If I understand Mikolov and other correctly, the word
distributed in their papers means that each single component of a
vector representation does not have any meaning of its own. The
interpretable features (for example, word contexts in case of word2vec)
are hidden and
distributed among uninterpretable vector components:
each component is responsible for several interpretable features, and
each interpretable feature is bound to several components.
So, word2vec (and doc2vec) uses distributed representations technically,
as a way to represent lexical semantics. And at the same time it is
conceptually based on distributional hypothesis: it works only because
distributional hypothesis is true (word meanings do correlate with their
But of course often the terms
used interchangeably, increasing misunderstanding :)