Is it possible to identify different queries/questions in sentence?



I want to identifies different queries in sentences.

Like - Who is Bill Gates and where he was born? or Who is Bill Gates, where he was born? contains two queries

  1. Who is Bill Gates?
  2. Where Bill Gates was born

I worked on Coreference resolution, so I can identify that he points to Bill Gates so resolved sentence is "Who is Bill Gates, where Bill Gates was born"

Like wise

MGandhi is good guys, Where he was born?
single query
who is MGandhi and where was he born?
2 queries
who is MGandhi, where he was born and died?
3 quries
India won world cup against Australia, when?
1 query (when India won WC against Auz)

I can perform Coreference resolution (Identifying and converting he to Gandhi) but not getting how can I distinguish queries in it.

How to do this?

I checked various sentence parser, but as this is pure nlp stuff, sentence parser does not identify it.

I tried to find "Sentence disambiguation" like "word sense disambiguation", but nothing exist like that.

Any help or suggestion would be much appreciable.


Posted 2014-10-16T05:44:40.183

Reputation: 61

"Where he was born?" is not a question. "Where was he born?" is. Are you trying to parse poor English? – Spacedman – 2015-02-18T18:02:01.703



The basic thing, you can do in that situation, is to split your query into N simple sentences each of which should be processed in order to receive YES/NO answer considering if the sentence is a query. That way you will receive following results:

Input: Gandhi is good guys, Where he was born?
Gandhi is good guys - not query
Where he was born?  - query
1 query

Input: who is MGandhi and where was he born?
who is MGandhi     - query
where was he born? - query
2 queries

This approach will require anaphora resolution (in order to convert he into Gandhi in first example) and a parser to correctly divide complex sentence into simple ones.


Posted 2014-10-16T05:44:40.183

Reputation: 759

thanks for this. But already have completed anaphora resolution part. I already added it in question description – puncrazy – 2014-10-16T09:25:30.337

OvisAmmon: also , and and will not be sentence boundary in all case – puncrazy – 2014-10-16T10:03:14.787

I'm not sure, that your last comment makes grammatical sense. In both examples comma and conjunction and are exactly sentence boundary markers as they split a complex sentence into two simple ones. More of that, Stanford NLP assigns POS tag CC to the conjunction and (you can try it online @ their site). So even with anaphora present as a prerequisite, basic thing you need to learn how to do is simple sentence borders. After that your task is simplified to binary classification query/not_query. – chewpakabra – 2014-10-17T13:24:31.757