Is there something like OpenCV for voice recognition & NLP?



As you know OpenCV is a big great open-source library for image recognition and machine vision(and may further purposes like computer graphics, etc).

Is there similar library in sound field(Voice recognition/ NLP(Natural Language Processing))?

I know espeak for TTS, also pocketsphinx for voice recognition. Also there is something like ChatScript that I don't know if I can consider as a NLP engine or not? But I like to know did I mentioned the Best libraries for each part of sound/voice field or there are better options to learn and work with them?

Also will happy to hear some suggestions about best book(s) to read to learn the concepts/algorithms of ASR/NLP.


I don't know about voice recognition but for NLP i think that Gensim could be what you are looking for!

Gensim is a NLP package that contains efficient implementations of many well known functionalities for the tasks of topic modeling such as tf–idf, Latent Dirichlet allocation, Latent semantic analysis...

About the readings, maybe you can start with the original word2vec paper (“The meaning of a word can be inferred by the company it keeps”).


According to what Josh Dotson posted via medium,gives a clear insightful knowledge concerning the following;

1.Speech data besides speech recognition.

  1. Language modelling.

  2. Text to speech.

  3. Machine translation.

  4. Signal processing.

And lastly, books and blogs for further research

Resources for acknowledgement


I would advise you to look into Mozilla’s implementation of Baidu DeepSpeech here

In the field of Automatic Speech Recognition (ASR) Kaldi is the current leader. Before Deep Neural Network era there were Sphinx and HTK.


