The difference between a Speech API and a Speech Engine is: Speech API's enable developers to integrate speech recognition technologies into developer apps. On the other hand a speech engine is software that gives your computer the ability to play back text in a spoken voice. (Source msdn library)
Below is a list of speech recognition tool-kits and their features.
Tensorflow - Although tensorflow doesn't arrive packaged with speech recognition libraries by default. You can use seq2seq models which have achieved high levels of accuracy in speech recognition. A few of the advantages of using tensorflow for speech recognition include: It comes with tensorboard which is useful in visualising and fine-tuning your network, it's architecture is highly modular which allows you to experiment with different voice libraries and finally it is portable meaning you can run it on GPU's, CPU's, servers or even mobile computing platforms.
Microsoft's CNTK - Microsofts Cognitive Toolkit delivers excellent results when it come to speech recognition. In recent times CNTK has outperformed humans in transcribing speech to text. Some of its perks are its efficient resource usage additionally it was originally built for speech recognition systems and consequently it is very effective at working with time series data.
CMU Sphinx - CMU Sphinx is a speech recognition system developed at Carnegie Mellon University. The advantages of using CMU Sphinx are: it is multilingual and supports most international languages, it has excellent commercial support, it has a light mobile version called pocketsphinx, it has a wide range of tools for different purposes i.e. keyword spotting, alignment and pronunciation evaluation. It also enjoys active support from the Carnegie Mellon University.
Kaldi - Kaldi aims to provide speech recognition software that is flexible and extensible. Kaldi has powerful features such as pipelines that are highly optimized for parallel computing i.e. training models on the GPU. Additionally it supports speaker identification and detection of errors in transcripts.
Mozilla Deep Speech - This project aims at providing speech interfaces for the web. This project achieved a word error rate on LibriSpeech's test-clean of 6.5% which is commendable.
I found this paper comparing open source speech recognition toolkits to be relevant to your question http://suendermann.com/su/pdf/oasis2014.pdf
Siraj Raval has an excellent tensorflow speech recognition tutorial. It is available here https://github.com/llSourcell/tensorflow_speech_recognition_demo