7

1

I have to solve a time series model that can take one of two shapes. It can probably take more but here are the two I'm going to ask about. If you have other ideas they are of course welcome.

**First Possible Model -**

X(Dependent variable 'Spending') = X(lag1)...X(lagN) + X(Dummy variable when US) + X(Dummy Variable when Mexico) ... + Error term

**Or Make a separate model for each Country like -**

X(Total Spending in the US only) = X(lag1)...X(lagN) + Error term

X(Total Spending in the Mexico only) = X(lag1)...X(lagN) + Error term

… for Each country

Mathematically I can't decide what approach is better. I will use an `F-Statistic, dickey fuller statistic`

to check the auto regression for stationarity and then compare the two models but I wanted to see what others thought of the theory and if you should ever include the dummy variables.

I'm looking more for an answer that includes mathematical reasoning.

1Your question doesn't match your title. What you're asking is not about feature selection, since you've decided what features to use (N lags, country dummies). For most datasets (may be yours is highly imbalanced), the two models you propose should yield very similar results. I actually think you

shouldbe thinking more about feature selection. – horaceT – 2017-02-04T20:41:11.180