Recommended Modelling Technique for Influencer Marketing Scenario



I have an approximately 90,000 row dataset that has information of social media profiles which has columns for biography, follower count, language spoken, name, username and the label (to identify whether the profile is that of an influencer, brand or news and media).

Task: I have to train a model that predicts the label. I then need to produce a confidence interval for each prediction.

As I have never come across a problem like this, I am just after some suggestions of what models I should be using for a situation like this? I am thinking Natural Language Processing (NLP), but not sure.

Also, for NLP (if a suitable method), any codes or advice to help me implement for the first time on Python would be greatly appreciated! Thanks in advanced


Posted 2018-04-30T23:58:24.823

Reputation: 53



It depends very much on the structure of the data.

I would think about feature extraction first, which could be certain words occurring in the bio, and a class of user name ('real' name, numerical id, etc). Once you have a set of features for each data item, turn them into a list of feature vectors.

Then run them through a number of machine learning algorithms. This is where the shape of the data matters, as some algorithms will work better than others. I would try eg decision trees (ID3), which are very efficient once trained (but they don't give you a confidence interval). But any other ML algorithm might work. They will all have trade-offs with speed of training, memory requirements, and speed of classification; some will give you a class-label probability, others will just give you one label.

The best way would be to use a sample, and identify which algorithm works well and fits your specific requirements. Then use that for the full data set.

Alternatively you could just use, for example, the Stanford ML classifier. That will give you a confidence interval, and will probably work reasonably well.

Oliver Mason

Posted 2018-04-30T23:58:24.823

Reputation: 3 755

Thank you so much Oliver - this is really helpful. I will try and implement as you have said – user9645302 – 2018-05-01T11:36:04.023

Note that selecting the right features might have a bigger impact on the outcome than the actual ML algorithm, so it's worth trying out several alternatives there as well. – Oliver Mason – 2018-05-01T15:49:16.877