## Forecastability

You are right that this is a question of forecastability. There have been a few articles on forecastability in the IIF's practitioner-oriented journal *Foresight*. (Full disclosure: I'm an Associate Editor.)

The problem is that forecastability is already hard to assess in "simple" cases.

## A few examples

Suppose you have a time series like this but don't speak German:

How would you model the large peak in April, and how would you include this information in any forecasts?

Unless you knew that this time series is the sales of eggs in a Swiss supermarket chain, which peaks right before western calendar Easter, you would not have a chance. Plus, with Easter moving around the calendar by as much as six weeks, any forecasts that don't include the *specific* date of Easter (by assuming, say, that this was just some seasonal peak that would recur in a specific week next year) would probably be very off.

Similarly, assume you have the blue line below and want to model whatever happened on 2010-02-28 so differently from "normal" patterns on 2010-02-27:

Again, without knowing what happens when a whole city full of Canadians watches an Olympic ice hockey finals game on TV, you have no chance whatsoever to understand what happened here, and you won't be able to predict when something like this will recur.

Finally, look at this:

This is a time series of daily sales at a cash and carry store. (On the right, you have a simple table: 282 days had zero sales, 42 days saw sales of 1... and one day saw sales of 500.) I don't know what item it is.

To this day, I don't know what happened on that one day with sales of 500. My best guess is that some customer pre-ordered a large amount of whatever product this was and collected it. Now, without knowing this, any forecast for this particular day will be far off. Conversely, assume that this happened right before Easter, and we have a dumb-smart algorithm that believes this could be an Easter effect (maybe these are eggs?) and happily forecasts 500 units for the next Easter. Oh my, could *that* go wrong.

## Summary

In all cases, we see how forecastability can only be well understood once we have a sufficiently deep understanding of likely factors that influence our data. The problem is that unless we know these factors, we don't know that we may not know them. As per Donald Rumsfeld:

[T]here are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know.

If Easter or Canadians' predilection for Hockey are unknown unknowns to us, we are stuck - and we don't even have a way forward, because we don't know what questions we need to ask.

The only way of getting a handle on these is to gather domain knowledge.

## Conclusions

I draw three conclusions from this:

- You
*always* need to include domain knowledge in your modeling and prediction.
- Even with domain knowledge, you are not guaranteed to get enough information for your forecasts and predictions to be acceptable to the user. See that outlier above.
- If "your results are miserable", you may be hoping for more than you can achieve. If you are forecasting a fair coin toss, then there is no way to get above 50% accuracy. Don't trust external forecast accuracy benchmarks, either.

## The Bottom Line

Here is how I would recommend building models - and noticing when to stop:

- Talk to someone with domain knowledge if you don't already have it yourself.
- Identify the main drivers of the data you want to forecast, including likely interactions, based on step 1.
- Build models iteratively, including drivers in decreasing order of strength as per step 2. Assess models using cross-validation or a holdout sample.
- If your prediction accuracy does not increase any further, either go back to step 1 (e.g., by identifying blatant mis-predictions you can't explain, and discussing these with the domain expert), or accept that you have reached the end of your models' capabilities. Time-boxing your analysis in advance helps.

Note that I am not advocating trying different classes of models if your original model plateaus. Typically, if you started out with a reasonable model, using something more sophisticated will not yield a strong benefit and may simply be "overfitting on the test set". I have seen this often, and other people agree.

1This problem can be answered in practical terms (as @StephanKolassa did) or in absolute terms (some sort of theorem that shows a given model can learn a problem iff certain conditions are satisfied). Which one do you want? – Superbest – 2016-07-05T19:40:39.823

3

This sounds similar to the classic halting problem of computer science? Let's say you have some algorithm A of arbitrary complexity which searches over input data D looking for predictive models, and the algorithm halts when it finds a "good" model for the data. Without adding significant structure on A and D, I don't see how you could tell whether A will ever halt given input D, how you can tell whether A will eventually succeed or continue searching forever?

– Matthew Gunn – 2016-07-05T19:43:10.633@Superbest it can be both. If you have something to add, feel free to answer. I never heard of theorem that states anything about dealing with real-life multidimensional noisy data, but if you know one that applies, then I'd be interested to read your answer. – Tim – 2016-07-05T19:46:45.373

3Based on @StephenKolassa's answer, another question you could spin off this one is 'At what point should I take my work so far back to the subject matter experts and discuss my results (or lack of results)?' – Robert de Graaf – 2016-07-06T13:20:19.620

Also related thread: https://stats.stackexchange.com/questions/28057/expected-best-performance-possible-on-a-data-set

– Jan Kukacka – 2018-03-02T19:07:05.637