I am trying to match new product description with the existing ones. Product description looks like this: ￼Panasonic DMC-FX07EB digital camera silver. These are steps to be performed:
- Tokenize description and recognize attributes: Panasonic => Brand, DMC-FX07EB => Model, etc.
- Get few candidates with similar features
- Get the best candidate.
I am having problem with the first step (1). In order to get 'Panasonic => Brand', DMC-FX07EB => Model, silver => color, I need to have index where each token of the product description correspond to certain attribute name (Brand, model, color, etc.) in the existing database. The problem is that in my database product descriptions are presented as one atomic attribute e.g. 'description' (no separated product attributes).
Basically I don't have training data, so I am trying to build index of all product attributes so I can build training data. So far I have attributes from bestbuy.com and semantics3.com APIs, but both sources lack most of attributes or contain irrelevant ones. Any suggestions for better APIs to get product attributes? Better approach to do this?
P.S. For every product there is a matched product description in the Database, which is as well in a form of one atomic attribute. I have checked this question on SO, it helped me and it seems we have same approach but I am still trying to get training data.