What is "lag" in time series forecasting?



I'm studying machine learning (e.i. time series analysis). I encountered an Azure tutorial, Retail Forecasting.


Here, they introduce new features called "lag", which I don't understand what it means. In 3.4, they said "The module selects the best lag of this index based on maximum correlation." What does it mean? Could you please let me know some references to learn background knowledge?


Posted 2020-04-17T11:43:41.713

Reputation: 1

When you have a variable which is dependent on time $x_t$, the first lag of $x$ is $x_{t-1}. So you look back in time. The idea is that in many cases, previous outcomes are dependent on "yesterdays" outcome. E.g. todays temperature is correlated to yesterdays temperature. – Peter – 2020-04-17T11:55:09.117

Thank you for your response. Could you also explain what "the best lag" means? – hgshin – 2020-04-17T12:05:33.193

You can choose lags $x_{t-1}$,$x_{t-2}$ up to $x_{t-n}$ and you don't know ex ante what lag(s) to choose. The best lag cold be ont with most power to explain some outcome. – Peter – 2020-04-17T12:11:39.580



Lag features are target values from previous periods.

For example, if you would like to forecast the sales of a retail outlet in period $t$ you can use the sales of the previous month $t-1$ as a feature. That would be a lag of 1 and you could say it models some kind of momentum. But you could also apply a lag of 12 to model the sales of the same month a year ago (since retail sales are often seasonal depending on the format, category and SKU). Accordingly, it depends on the dataset which lags work best and looking at correlation is one way to select the lag values.

Have a look at this blog post for a simple explanation of lag features.


Posted 2020-04-17T11:43:41.713

Reputation: 3 740