Unsupervised feature learning for NER

12

5

I have implemented NER system with the use of CRF algorithm with my handcrafted features that gave quite good results. The thing is that I used lots of different features including POS tags and lemmas.

Now I want to make the same NER for different language. The problem here is that I can't use POS tags and lemmas. I started reading articles about deep learning and unsupervised feature learning.

My question is:

Is it possible to use methods for unsupervised feature learning with CRF algorithm? Did anyone try this and got any good result? Is there any article or tutorial about this matter?

I still don't completely understand this way of feature creation so I don't want to spend to much time for something that won't work. So any information would be really helpful. To create whole NER system based on deep learning is a bit to much for now.

MaticDiba

Posted 2014-07-28T07:19:49.877

Reputation: 611

Answers

6

Yes, it is entirely possible to combine unsupervised learning with the CRF model. In particular, I would recommend that you explore the possibility of using word2vec features as inputs to your CRF.

Word2vec trains a to distinguish between words that are appropriate for a given context and words that are randomly selected. Select weights of the model can then be interpreted as a dense vector representation of a given word.

These dense vectors have the appealing property that words that are semantically or syntactically similar have similar vector representations. Basic vector arithmetic even reveals some interesting learned relationships between words.
For example, vector("Paris") - vector("France") + vector("Italy") yields a vector that is quite similar to vector("Rome").

At a high level, you can think of word2vec representations as being similar to LDA or LSA representations, in the sense that you can convert a sparse input vector into a dense output vector that contains word similarity information.

For that matter, LDA and LSA are also valid options for unsupervised feature learning -- both attempt to represent words as combinations of "topics" and output dense word representations.

For English text Google distributes word2vec models pretrained on a huge 100 billion word Google News dataset, but for other languages you'll have to train your own model.

Madison May

Posted 2014-07-28T07:19:49.877

Reputation: 1 959

Hey, fist I want to thank you for your answer. I have one more question. Word vector that are returned from word2vec algorithm have float values, so words like big and bigger will have vectors that are close in vector space, but the values of vectors could be completely different. For example big = [0.1, 0.2, 0,3] and bigger = [0.11, 0.21, 0.31]. Isn't that a problem for CRF algorithm, because this algorithm would treat them as not simillar? Is there any addional processing that sould be done before using this word vectors in CRF? I hope my question is clear enough. – MaticDiba – 2014-08-21T08:10:01.407

5

In this 2014 paper (GitHub), the authors compared multiple strategies of incorporating word embeddings in a CRF-based NER system, including dense embedding, binerized embedding, cluster embedding, and a novel prototype method. Using dense vectors directly as suggested by vlad is the most straightforward way but also the least effective in multiple evaluations.

I implemented the prototype idea in my domain-specific NER project and it works pretty well for me.

user2404894

Posted 2014-07-28T07:19:49.877

Reputation: 51

4

I am just 5 months late but with CRFSuite you can actually use those float features as numbers, not as strings. For this, you just need to invent an unique label for each dimension then add a ":" followed by the value.

For example, a word "jungle" is represented in 5 dimensions: 0.1 0.4 0.8 0.2 0.9

Then CRFSuite would take that word + feature as:

LABEL f1:0.1 f2:0.4 f3:0.8 f4:0.2 f5:0.9

where of course you replace ``LABEL'' by an actual string and you separate all spaces with tabs (that's the format for CRFSuite).

Not sure though for other packages.

vlad

Posted 2014-07-28T07:19:49.877

Reputation: 41