Natural Language to SQL query



I have been working on developing a system "Converting Natural Language to SQL Query".

I have read the answers from the similar questions, but was not able to get the information that I was looking for.

Below is the flowchart for such system which I have got from An Algorithm to Transform Natural Language into SQL Queries for Relational Databases by Garima Singh, Arun Solanki


I have understood till part of speech tagging step. But how do I approach the remaining steps.

  1. Do I need to train all the possible SQL queries?
  2. Or, once part of speech tagging is done, I have to play with the words and form a SQL query?

Edit: I have successfully implemented the from step "user query" to "Part of speech tagging".

Thank you.


Posted 2018-05-14T04:23:08.600

Reputation: 1 271

1As an alternative you may ask the human to take a SQL course... – Marmite Bomber – 2018-08-29T22:54:01.887

2At (I am one of the founders) we are building an NLP to SQL engine that you can use as an API. We are launching soon. Let me know if you want to get a demo. – Yehuda Kogan – 2018-08-29T13:55:55.043



If you want to tackle the problem from another perspective, with an end to end learning, such that you don't specify ahead of time this large pipeline you've mentioned earlier, all you care about is the mapping between sentences and their corresponding SQL queries.


How to talk to your database



A large annotated semantic parsing corpus for developing natural language interfaces.

Github code:

  1. seq2sql
  2. SQLNet

Also, there are commercial solutions like nlsql

Fadi Bakoura

Posted 2018-05-14T04:23:08.600

Reputation: 848

2+1, for answering well but haven't gone through the links yet – Toros91 – 2018-05-16T06:11:27.940

@Fadi Bakoura Thank you. Let me go through the links . – deepguy – 2018-05-16T06:56:15.523


NLTK has an excellent step by step guide on everything you need to convert human language to an SQL query using the nltk package in python.

It’s rudimentary, but it answers your question.


Posted 2018-05-14T04:23:08.600

Reputation: 1 316

Thanks @killerT2333 . I just had look. But it is kind of confusing. Is there any other simple doc ? – deepguy – 2018-05-14T16:15:31.703

3That's the simplest one I know of - it's quite a complex task what you're asking, so there's no simple answer to your question. On the nltk documentation they do take you through the theory at a high level, and at also at a low level with a lot of code examples. More extensive than that, you probably need to search github or research papers. – PyRsquared – 2018-05-14T16:17:56.610

I will go through that one more time. And update you here. – deepguy – 2018-05-14T16:58:13.520


To complement Fadi's answer, the following are other useful papers on NL to SQL methods. The major difference of these methods is that they support queries that should be answered using more than one table (joining different tables), however the Salesforce paper (and their dataset) is focused on queries on one table at a time.

Both of these papers use the GeoQuery dataset avaialbe here.


Posted 2018-05-14T04:23:08.600

Reputation: 21


There are lots of works on text-to-SQL task.

I strongly suggest you to check WikiSQL and Spider datasets. Studies start from seq2seq + attention mech. to BERT-based solutions. Also each study points out the importance of the input representaion where you can feed all the table schema or just a column name. It's a pretty deep topic and as @PyRsquared said there is no simple answer :)


Posted 2018-05-14T04:23:08.600

Reputation: 111