Model for Differing Number of Rows per Observation



Looking to build a response model (click or no click) on marketing data which displays varying number of offers to a person. I don't want to model which offer they click but do they click any of the offers presented to them. My issue is how to deal with the differing number and types of offers?

Example data could be one table of id's:

id   clicked
001       1
002       0
003       1

And varying number of offers per id:

id  discount_rate  on_amt
001     0.05       100
001     0.10       500
002     0.03        50
003     0.05       100
003     0.10       300
003     0.15       500

Do I create features from the offer data set such as average discount_rate, max on_amt etc.? Or create a very large binary sparse matrix of binned offer types such as rate_5-10_amt_0-50 1/0 and rate_5-10_amt_50-100 1/0 ...?

Or is there a good model that handles variable data like this?


Posted 2019-04-17T16:47:56.343

Reputation: 181

This seems like something there should be "an answer" to, but I'm not aware of one. Otherwise, I'd suggest trying each of your suggestions, as well as one other: why do you want to avoid predicting individual offer clicks? Do you have data at that level? If so, try fitting a model to that data as-is, then aggregating the results across rows of predictions, e.g. if each prediction on a row produces a probability, maybe just compute the probability of any click with an assumption about independence of a single individual clicking on different ads. (Be sure to report results here!) – Ben Reiniger – 2019-05-17T19:28:20.080

I can't think of what exactly this would look like off the top of my head, but there's probably a way to create two hierarchical models: the inner model predicting probability of click given discount_rate and on_amt. The outer model predicting binary click, given the inner model's average probability over all ads presented and the user id. Just a rough brainstorm. The ad industry probably already has this problem figured out, so you might try searching more online... – Alex L – 2019-05-17T19:54:28.150 – Ben Reiniger – 2019-09-15T03:54:49.687

This data from what I can tell is easily handled using mixed effect models where you can simply let id be a random effect. You can then obtain predictions per id (level 2) and predictions per specific instances of an id. Predictions are pulled to the group mean for those with low amounts of data. IMO, this is by far the most natural approach. If you wish to use ML, consider mixed effect random forests or Bayesian methods. – aranglol – 2020-08-03T06:39:46.803



You need to create a tidy version of the data with the on_amt and discount_rat encoded as a categorical variable (e.g., one-hot encoded). If they are continuous, they need to binned into categorical variables then encoded.

Brian Spiering

Posted 2019-04-17T16:47:56.343

Reputation: 10 864


Our team uses ‘featuretools’’s deep feature synthesis for exactly this scenario. In this way you can capture much more signal via various aggregations per feature (mean, most_recent, mode etc.)

Anders Swanson

Posted 2019-04-17T16:47:56.343

Reputation: 111