Text extraction from documents using NLP or Deep Learning



I am looking for references(Papers/github projects) on how to use deep learning in a text extraction task.

Recently I was given a task to extract important information from documents of similar type, say for example legal merger documents. I have thousands of legal merger documents as inputs. A paralegal would go through the entire document and highlight important points from the document. This is the extracted text.

What I want to do: Given a document(say legal merger document) I want to use DL or NLP to extract the information from the legal document that would be similar to that of the information extracted by paralegal.

I am currently using bag of words model to extract text from the document, calculating sentiment and displaying the sentences with positive or negative sentiments. This yielded very bad results.

My knowledge in DL/NLP is very limited and I am particularly looking for some interesting papers and github projects related to text extraction using these frameworks. Can anyone please provide me with some references and suggestions on how to tackle this issue?


Posted 2018-06-19T16:09:57.667

Reputation: 95

1Phaneeth, Can you please explain how did you accomplish this task in a bit detail as I am also looking for a solution for a similar problem. Thank you in advance – SabVenkat – 2018-09-03T22:33:38.340

@Phaneeth could you explain a bit more about how you applied sequence to sequence modeling to highlight important points from a document? Or references to sources that helped you complete this task? – Jameezzz – 2018-12-07T22:22:29.987

@Phaneeth: Can you please share you code/approach? – Mauryas – 2019-03-27T19:27:34.867



Jurafsky and Martin's NLP textbook has a chapter about information extraction that should be a good starting point. For example, if you want to extract company names it will tell you how to do that.

A paralegal would go through the entire document and highlight important points from the document.

What you need to do depends heavily on what your definition of "important" is here. It would help if you can give some specific examples.


Posted 2018-06-19T16:09:57.667

Reputation: 258

Sorry for the late response. I have used sequence to sequence modeling to compete this task. – Phaneeth – 2018-08-31T16:04:25.223

@Phaneeth hey, I am working on a similar task can you explain how did you apply-sequence to sequence modeling. – Bhawesh Chandola – 2019-07-05T05:31:45.900