Is the search for a specific n-gram the same like a string search?


Is the result of a search for a specific n-gram like sherlock+holmes equal to the result of a regex search for "sherlock holmes" in the same document corpus?

So if i read about n-grams for certain words, that's the same like normal string search?



Posted 2020-01-02T20:09:21.313

Reputation: 51



Well, as you state the problem, it is true that the search for a certain sequence of strings/words is the same as looking for the corresponding n-gram.

However, keep in mind that an n-gram, when you use it for ML, (often) is represented as a factor. So a certain sequence of words or strings, is thought of carrying valuable information. Like „John Holmes“ is different from „Sherlock Holmes“, when it comes to, e.g. identifying book titles or covers.

In general, an n-gram is just a certain sequence of words/strings, which carry more information than only one word/string.


Posted 2020-01-02T20:09:21.313

Reputation: 4 724