How can Time Series Analysis be done with Categorical Variables

6

3

Most of the time series analysis tutorials/textbooks I've read about, be they for univariate or multivariate time series data, usually deal with continuous numerical variables.

I currently have a problem at hand that deals with multivariate time series data, but the fields are all categorical variables. Hence, I was wondering if there is any way to use the standard time series analysis techniques (such as ARIMA, ARMA etc.)

Specifically, my data is a stream of alert data, where at each time stamp, information such as the alert monitoring system, the location of the problem etc. are stored in the alert. These fields are all categorical variables.

Brian Yen

Posted 2019-06-20T09:06:34.683

Reputation: 61

What kind of output do you want from your model? – jonnor – 2020-04-12T09:19:21.187

Answers

3

By definition time-series ARIMA models assume that, given a numerical observation at time $t-k$, the value of the numerical variable $X$ at time $t$ can be approximated as $$ X_t = \sum_{j=1}^p a_j X_{t-j} + \varepsilon_t + c $$ where $\varepsilon$ is a white noise error term and the $a_j$ are parameters to be determined. The idea is that the numerical variable $X$ at time $t$ only depends on some of its values at previous times; as you can see, by construction the above works for numerical variables only. Then one introduces some more conditions about moving averages and deviations to be verified and is able to prove that, under such conditions, the form of the coefficients $a_j$ can be determined.

The standard way to deal with categorical variables in these cases is to use one-hot encoding, namely you introduce dummy variables for each level of your category and fit against the dummy being 1 or 0, according to whether such category is present or not at time $t-k$. A similar question was asked here and you might want to have a look.

Another thing that you may want to ask yourself is whether time series is really what you are looking for, rather than just any classification model that provides a prediction given a set of categories and a seasonal variables.

gented

Posted 2019-06-20T09:06:34.683

Reputation: 526

2

@Brian Yen,

I'm answering instead of commenting because of the reputation points hindrance.

The above answer by @couturierc does not address the issue and it's amazing how often people use that to reply, instead of just... suppressing their thoughts.

Regressors are independent variables that are used as influencers for the output. Your case — and mine! — are to predict categorical variables, meaning that the category itself is the output. And you are absolutely right, Brian, 99.7% of the TSA literature focuses on predicting continuous values, such as temperatures or stock values.

My problem is predicting sensor data, which can be exclusively on or off. By exclusively, I mean that I have a set of sensors and at any given time only one of them can be on. So, my data is by nature one-hot. To make matters all the funnier, my timestamps are not evenly spaced.

The furthest I could go was to use traces and also some playing with Prophet, but so far all I could come up with is independent category value prediction. In fact, Prophet makes it very obvious that one value is to be analysed at a time.

Alternatives suggested to me include label encoding with scikit-learn (Attention! New way to do this with from sklearn.compose import ColumnTransformer), but this poses the ancient question of having numeric values on the categories.

Since your post is from 9 months ago, if you have come up with some solution, I would love to hear about it.

Ricardo

Posted 2019-06-20T09:06:34.683

Reputation: 21

Hey, have you come up with any other good solution? I am facing a similar kind of situation here. I want to build a model that not only predict the value for the next N period but also have to predict the "categorical column" and its corresponding continuous value – Madhi – 2020-11-06T12:45:35.150

0

Have you tried Prophet?

It allows you to add regressors: https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#additional-regressors

You might need to one-hot encode (or label encode) the categorical variables, and then pass them to the model using the add_regressor() method.

couturierc

Posted 2019-06-20T09:06:34.683

Reputation: 11