Anomaly detection using RNN LSTM



I'm trying to detect anomalies in an univariate time series. I trained a RNN LSTM and currently I get one-step-ahead predictions.

Could someone explain if it's possible to output a confidence interval (or maybe a prediction interval) with RNN LSTM instead of just a predicted value.

When I want to flag observations as anomulous on real-time data I think I need more information than just the predicted and observed value for point x (f.e. I need a predicted confidence interval).

Please point me into the right direction. Currently I'm using an Adam optimizer, MSE as the loss function and a Dense output layer with a linear activation function.

Mike Evers

Posted 2018-04-24T12:51:14.393

Reputation: 93

Do you need the predicted CI? Have you tried using |actual - precicted| as the measure of anomaly-ness? – kbrose – 2018-04-25T00:56:13.760

There is no static threshold of |actual - predicted| being anomalous. F.e. at night there is less noise then overday. – Mike Evers – 2018-04-25T07:28:11.027

Please can someone help? – Mike Evers – 2018-04-25T13:55:14.807

tried to give a couple ideas in an answer. – kbrose – 2018-04-25T20:11:15.127



Disclaimer, I have not tried any of these ideas.

Predict the CI directly

Note: This method will require very large batch sizes, which may not be possible due to memory constraints.

Set the batch size to something large. You'll also need to decide what you want your confidence interval to be, 95% matches a lot of expectations but may be less numerically stable than something like 90% of 80%. We'll call this value $0 < c < 1$.

In addition to outputting a prediction $\hat{y_i}$ of the true value $y$, have your model also output a lower bound $l_i(c)$ and an upper bound $u_i(c)$ for each individual estimate.

Now, define $$ z = \begin{align} & \left(\frac{1-c}{2} - \frac{\sum_{i=1}^n (1\ \text{ if }\ (y_i < l_i(c))\ \text{ else }\ 0)}{n} \right)^2 \\ + & \left(\frac{1-c}{2} - \frac{\sum_{i=1}^n (1\ \text{ if }\ (y_i > u_i(c))\ \text{ else }\ 0)}{n} \right)^2 \end{align} $$

In essence, $z$ is the (squared) measure of how many times the predicted lower/upper bounds are exceeded vs. how many times we expect them to be exceeded. In practice, some tweaks to exactly how $z$ is computed may be necessary, but the main idea of penalizing the model for having too lax or too restricted in its estimations of the lower/upper bounds should be preserved.

Add $z$ to your MSE loss function, you'll likely need to balance these two losses, i.e. add an additional hyperparameter $\alpha$ to your model such that $\text{TOTAL LOSS} = \text{MSE} + \alpha z$.

As an example, if you set your batch size/time-step length so that there are 500 distinct estimates (e.g. $n=500$), and set $c$ to 0.9, in theory this training procedure should encourage the model to estimate $l_i(c)$ and $u_i(c)$ such that exactly 25 actual $y_i$ values are below $l_i(c)$ and exactly 25 $y_i$ are above $u_i(c)$.

Predict the error directly

In addition to getting a prediction $\hat{y}$ of the true value $y$ and training with the MSE loss function, add another output of your model which attempts to predict the squared error $\epsilon = (\hat{y} - y)^2$ directly.

You can create the MSE of the estimate $\hat{\epsilon}$ of $\epsilon$ and combine that with your existing MSE to get the total loss for training.

You can then use this value to normalize your errors to try and get an anomaly score that is usable across different sub-populations:

$$ \text{anomaly score} = \frac{(y - \hat{y})^2}{\hat{\epsilon}} $$ where higher scores indicate that the true error is larger than expected.


Posted 2018-04-24T12:51:14.393

Reputation: 1 637

Thanks for your answer. I am going to experiment with it. – Mike Evers – 2018-04-30T08:52:16.277


Right now instead of predicting a confidence interval I'm using a different approach, as described in section 3.1.2 of the following papers.

Mike Evers

Posted 2018-04-24T12:51:14.393

Reputation: 93

1Can you provide a full citation (title, author, etc.) so that we can still find this document even if the link stops working? Can you summarize the main idea from that paper? – D.W. – 2018-05-15T22:50:33.343