I have some dictionary data like this:
Tag | Definition Noun | a man who works in a restaurant, serving people food and drink Noun | a person who does a specified type of work or who works in a specified way Noun | a person who habitually seeks to harm or intimidate those whom they perceive as vulnerable. Adj | relating to meaning in language or logic. Adj | of or like snow, especially in being pure white Adj | of or according to syntax.
My task is to find an appropriate 'Tag' based on a given 'Definition'.
There are obviously some patterns that we use when describing a noun and other patterns for describing an adjective. I would like some kind of algorithm or method for finding these patterns. I tried machine learning (e.g., Naive Bayes, SVM, Logistic Regression, MLP) but due to the fact that most of my samples are 'Noun', I could not reach a good accuracy. (5000 Nouns, 2000 Adjectives, 900 Verbs, 900 Other)
I am thinking of using another algorithm that does not require a lot of data or can handle imbalanced classes. I've seen FP-Growth and Apriori algorithms. But I am not sure if they are capable of handling advanced patterns like this:
A person who does X to Y.
By mentioning this pattern, I wanted to point out that there are some keywords in a pattern ('A', 'person', 'who', 'does', 'to') and some other words that should be ignored. Is there an algorithm for handling this situation?
Note: My objective is to find frequent ordered-sets.