6

1

I have a timeseries with hourly gas consumption. I want to use ARMA/ARIMA to forecast the consumption on the next hour, basing on the previous. Why should I analyze/find the seasonality (with Seasonal and Trend decomposition using Loess (STL)?)?

6

1

I have a timeseries with hourly gas consumption. I want to use ARMA/ARIMA to forecast the consumption on the next hour, basing on the previous. Why should I analyze/find the seasonality (with Seasonal and Trend decomposition using Loess (STL)?)?

7

"Because its there".

The data has a seasonal pattern. So you model it. The data has a trend. So you model it. Maybe the data is correlated with the number of sunspots. So you model that. Eventually you hope to get nothing left to model than uncorrelated random noise.

But I think you've screwed up your STL computation here. Your residuals are clearly not serially uncorrelated. I rather suspect you've not told the function that your "seasonality" is a 24-hour cycle rather than an annual one. But hey you haven't given us any code or data so we don't really have a clue what you've done, do we? What do you think "seasonality" even means here? Do you have any idea?

Your data seems the have three peaks every 24 hours. Really? Is this 'gas'='gasoline'='petrol' or gas in some heating/electric generating system? Either way if you know a priori there's an 8 hour cycle, or an 8 hour cycle on top of a 24 hour cycle on top of what looks like a very high frequency one or two hour cycle you **put that in your model**.

Actually you don't even say what your x-axis is so maybe its days and then I'd fit a daily cycle, a weekly cycle, and then an annual cycle. But given how it all changes at time=85 or so I'd not expect a model to do well on both sides of that.

With statistics (which is what this is, sorry to disappoint you but you're not a data scientist yet) you don't just robotically go "And.. Now.. I.. Fit.. An... S TL model....". You look at your data, try and get some understanding, then propose a model, fit it, test it, and use the parameters it make inferences about the data. Fitting cyclic seasonal patterns is part of that.

2This was a generic question. I did not include code because I did not want to solve a specific problem. I never stated I'm a data scientist and... IMHO it is not necessary to be so "rude" :). Thank you anyway for the explanation. – marcodena – 2014-07-23T13:56:06.313

2I would downvote this answer if I could. Modeling a feature of the data for its own sake is absurd. Models exist to answer questions, and if seasonality does not affect your question or its answer, then it is not only allowable but desirable to ignore it. With that in mind, @marcodena, the only answer to your question is "because sometimes ignoring it will bias your predictions." I think a better question would ask which times those are. I also disagree with implication here that data always precedes a model in statistics. – shadowtalker – 2014-08-04T23:44:04.903

2I know but I'm a student, so I'm learning. The question is made for this reason :) – marcodena – 2014-08-08T13:27:54.967

1

In anomaly detection (one sort of STL application), it's easier to see outliers if you can normalize the original time series to a residual series (called "remainder" in R's STL package). That involves identifying components that you can subtract from the original series. In STL, the components are the seasonal series and the trend.

If there is a regular, recurring pattern in the data (i.e., a seasonality), then you don't want the expected ups/downs of that pattern to drive the anomaly detector. Instead you want to be able to say "are we higher or lower than expected, factoring out the regular ups/downs that happen over the course of the day (or week, or month, etc.)". Identifying and removing the seasonal series allows you to ask that question.

The decomposition you generated does not look right: the seasonal pattern doesn't seem to have captured much structure, the trend overfits the data, and the residuals have a regular structure that the seasonal series ought to have captured. I don't know whether this is because the dataset itself doesn't lend itself to an STL decomposition (the series tail appears qualitatively different than the initial part) or if it's because the parameters are wrong. Figure 6.10 on this page shows a more prototypical case.

https://www.otexts.org/fpp/6/5 – marcodena – 2014-07-20T18:02:37.283

I think your question would be better stated as, What is the importance of seasonality for forecasting. As it is, it seems someone told you to use "STL", but you don't say who told you so, neither why (which is probably what you're trying to find out). – Rubens – 2014-07-20T19:37:34.577