Extrapolating GLM coefficients for year a product was sold into future years?


I've fit a GLM (Poisson) to a data set where one of the variables is categorical for the year a customer bought a product from my company, ranging from 1999 to 2012. There's a linear trend of the coefficients for the values of the variable as the year of sale increases.

Is there any problem with trying to improve predictions for 2013 and maybe 2014 by extrapolating to get the coefficients for those years?


Posted 2014-08-23T13:47:01.907

Reputation: 317



I believe that this is a case for applying time series analysis, in particular time series forecasting (http://en.wikipedia.org/wiki/Time_series). Consider the following resources on time series regression:

Aleksandr Blekh

Posted 2014-08-23T13:47:01.907

Reputation: 6 438

The reason I'm using regression is that I need the per year rate of change for reasons I'd rather not get into right now. – JenSCDC – 2014-08-23T20:24:15.167

1@AndyBlankertz: I just updated my answer. – Aleksandr Blekh – 2014-08-23T20:33:27.433

Thanks. I'd love to delve into the resources, but I'm time limited- the report I'm working on is due on Friday. I also have some slack in statistical rigorousness, because the target audience is Management :) Hopefully next week. – JenSCDC – 2014-08-23T20:44:59.527

1@AndyBlankertz: You're welcome. I understand, as I'm not a statistician myself :-). But I'm trying to learn wherever and whenever I can. – Aleksandr Blekh – 2014-08-23T20:54:16.963

This isn't time series analysis (unless you throw away loads of data). I think the data records are individual sales records with a year attached as a covariate. Time series analysis is used when the variable of interest (eg total sales) has a unique time point. You could compute total sales within years and do time series analysis, but that would mean losing all the other information from each sales record (eg item purchased, buyer age etc). Regression is the right thing here. – Spacedman – 2014-08-26T08:51:42.180

@Spacedman: The term I've emphasized in my answer is time series regression. Thus, in my view, it could be considered as a special case of either of the two approaches, depending on the perspective. – Aleksandr Blekh – 2014-08-26T09:12:48.210

All I'm saying is that individual sales records data are not time series data. So you can't treat them like time series data. So reading about fitting AR(1) models and time series regression approaches is a waste of the OP's time here when all they have to do is convert year to numeric and run the model again. My concern now is wondering exactly what the OP means by "per-year rate of change", which may imply something more than a linear term in year is required (some kind of smoother or polynomial term perhaps). – Spacedman – 2014-08-26T09:40:26.557

@Spacedman: I see. Thank you for the clarification. However, my initial impression was that for this particular task, the OP is only interested in future values of a single aggregate outcome variable (keeping the model's full information for regression analysis). That would be the case for time series forecasting, wouldn't it? Perhaps, I misunderstood the question. – Aleksandr Blekh – 2014-08-26T10:01:11.867


If you suspect your response is linear with year, then put year in as a numeric term in your model rather than a categorical.

Extrapolation is then perfectly valid based on the usual assumptions of the GLM family. Make sure you correctly get the errors on your extrapolated estimates.

Just extrapolating the parameters from a categorical variable is wrong for a number of reasons. The first one I can think of is that there may be more observations in some years than others, so any linear extrapolation needs to weight those year's estimates more. Just eyeballing a line - or even fitting a line to the coefficients - won't do this.


Posted 2014-08-23T13:47:01.907

Reputation: 1 852

Hmm... it never occurred to be to make year a continuous variable. In retrospect it seems obvious. – JenSCDC – 2014-08-26T18:51:31.157