Waging a war against how to model time series vs fitting
It wasn't until we started talking with probabilitymanagement.org's Sam Savage at the INFORMS2015 Business Analytics about modifying Autobox to create "simulated forecasts" that are ready for their SIPmath Tools did we see the opportunity to correct a long standing thorn in our side. We don't think the rest of the planet has figured this out. Go ask your forecasting software about this and see what they say! This should shake up your CONFIDENCE.
Here is the "Unveiling of Two Critical Assumptions" that you may not have been told:
Assumptions:
1)The estimated parameters are the population parameters
2)Pulse outliers that get identified and cleansed may happen again in the future. Read that sentence again as we are not talking about a debate about how to model outliers or the impact on the forecast, but only related to the confidence limits.
In the mid 1980’s, when AUTOBOX was the first commercially available software which incorporated four Intervention Detection schemes in the development of ARIMA and Transfer Function models. Outliers("Pulses"), Level Shifts, Seasonal Pulses and Local Time Trends can all play an important role in discovery the basic stochastic structure as ignoring this blocks identifying the ARIMA component. Identification and incorporation of these four empirically discoverable model components enable robust model estimation. This step is necessary to ensure that the error process has a stationary/normally distributed error process yielding valid tests of statistical significance and subsequent model sufficiency.
Pulses do not play a role in forecasting as they are “expected to not exist in the future” whereas the other three do. Let's state here that of course there are exceptions and sometimes outliers should be allowed! Early on in the development of time series solutions’ researchers (including us) recognized that while Pulses were important incorporating them led to an unwarranted rosy view of forecasting uncertainty. Forecasting uncertainty was also plagued by the fact that no consideration is made for the uncertainties in parameter estimates as the well-known (but ignored until now) Box-Jenkins procedures tacitly assumed that the estimated parameters are identical to the unknown population parameters as the only contributor to the computation of the confidence limits were the psi weights and the error variance.
In the forecasting context, removing outliers is can be very dangerous. If you are forecasting sales of a product and let’s assume that there was a shortage of supply thus there are periods of time with zero sales. Recall that sales data is not demand data. The observed flawed time series then contains a number of outliers/pulses. Good analysis detects the outliers, removes them or in effect replaces the observed values with estimates and then proceeds to model and then forecast. You assumed that no supply shortage like this will happen in future. In practical sense, you compressed your observed variance and estimated error variance. So, if you show the confidence bands for your forecast they will be tighter/narrower than they would have been if you did not remove the outliers. Of course, you could keep the outliers, and proceed as usual, but this is not a good approach either since these outliers will distort the model identification process and of course the resultant model coefficients.
A better approach in this case is to continue to require an error distribution that is normal (no fat tails) while allowing for forecast uncertainties to be based upon an error distribution with fat tails. In this case, your outlier will not skew the coefficients too much. They'll be close to the coefficients with an outlier removed. However, the outlier will show up in the forecast error distribution. Essentially, you'll end up with wider and more realistic forecast confidence bands.
AUTOBOX provides an integrated and comprehensive approach to identify the outliers using the rigorous Intervention Detection procedure leading to robust ARIMA parameters and delivering a good baseline forecast. It now develops simulated forecasts which are free of pulse effects thus more correctly reflecting reality. In this way, you get the best of both worlds namely a good model with a good baseline forecast and more realistic confidence limits that INCLUDE OUTLIERS for the forecast. These uncertainties obtained via simulation have the benefit that they do not assume zero pulses in the future, but rather reflect their random reoccurence and secondly that the estimated model parameters are known without error.
So, you have robust parameter estimates and more realistic forecast uncertainties. Now it is time to go ask your software vendor about this or find a new tool!