Detect related sentences

2

This question is related to "How to grow a list of related words based on initial keywords?"

In the previous question they attempt to get similar words to a given word. However, I am interested in knowing the possibility of doing it to sentences.

As I'm not familiar with this area, my questions are:

Is there a way to do this with sentences (without just considering words)? What tools are available for it?

Smith Volka

Posted 2017-08-24T05:53:34.230

Reputation: 575

1

Welcome to the site! Use a sentence embedding (numerical representation) then perform similarity search.

– Emre – 2017-08-24T06:08:37.467

Many thanks! Without developing these models from scratch, are there any available tools I can utilise? – Smith Volka – 2017-08-24T06:15:24.753

The ones I just linked to... – Emre – 2017-08-24T06:33:38.473

I am sorry as I am new to this field. But I don't see tools in those sites. Except some research papers links – Smith Volka – 2017-08-25T01:34:40.790

Answers

3

Word Mover’s Distance (WMD) is an algorithm for finding the distance between pairs of strings. It is based on word embeddings (e.g., word2vec) which encode the semantic meaning of words into dense vectors.

The WMD distance measures the dissimilarity between two text documents as the minimum amount of distance that the embedded words of one document need to "travel" to reach the embedded words of another document.

For example:

enter image description here Source: "From Word Embeddings To Document Distances" Paper

The gensim package has a WMD implementation.

Brian Spiering

Posted 2017-08-24T05:53:34.230

Reputation: 10 864

2

Using the python package Fuzzy Wuzzy is also useful

It uses Levenshtein distance from the python-Levenshtein package, and gives you different options for re-arranging or using word tokens.

scollins

Posted 2017-08-24T05:53:34.230

Reputation: 21