Autobox Blog

Thoughts, ideas and detailed information on Forecasting.

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that has been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.
Subscribe to this list via RSS Blog posts tagged in spss modeler ibm

Alteryx is using the free "forecast" package in R. This BLOG is really more about the R forecast package then it is Alteryx, but since this is what they are offering.....

In their example on forecasting(they don't provide the data with Alteryx that they review, but you can request it---we did!), they have a video tutorial on analyzing monthly housing starts.

While this is only one example(we have done many!!).  They use over 20 years of data.  Kind of unnecessary to use that much data as patterns and models do change over time, but it only highlights a powerful feature of Autobox to protect you from this potential issue.  We will discuss down below the use of the Chow test.

With 299 observations they determine of two alternative models (ie ETS and ARIMA)which the best model using the last 12 making a total of 311 observations used in the example. The video says they use 301 observations, but that is just a slight mistake.  It should be noted that Autobox doesn't ever withhold data as it has adaptive techniques which USE all of the data to detect changes.  It also doesn't fit models to data, but provides "a best answer".  Combinations of forecasts never consider outliers.  We do.

The MAPE for ARIMA was 5.17 and ETS was 5.65 which is shown in the video.  When running this in Autobox using the automatic mode, it had a 3.85 MAPE(go to the bottom). That's a big difference by improving accuracy by >25%.  Here is the model output and data file to reproduce this in Autobox.

Autobox is unique in that it checks if the model changes over time using the Chow test.  A break was identified at period 180 and the older data will be deleted.

      DIAGNOSTIC CHECK #4: THE CHOW PARAMETER CONSTANCY TEST
             The Critical value used for this test :     .01
             The minimum group or interval size was:     119

                    F TEST TO VERIFY CONSTANCY OF PARAMETERS                    
                                                                                
           CANDIDATE BREAKPOINT       F VALUE          P VALUE                  
                                                                                
               120 1999/ 12           4.55639          .0039929423              
               132 2000/ 12           7.41461          .0000906435              
               144 2001/ 12           8.56839          .0000199732              
               156 2002/ 12           9.32945          .0000074149              
               168 2003/ 12           7.55716          .0000751465              
               180 2004/ 12           9.19764          .0000087995*             

* INDICATES THE MOST RECENT SIGNIFICANT  BREAK POINT:    1% SIGNIFICANCE LEVEL. 

  IMPLEMENTING THE BREAKPOINT AT TIME PERIOD    180: 2004/   12 

  THUS WE WILL DROP (DELETE) THE FIRST   179 OBSOLETE OBSERVATIONS
  AND ANALYZE THE MOST RECENT   120 STATISTICALLY HOMOGENOUS OBSERVATIONS

 

DIAGNOSTIC CHECK #4: THE CHOW PARAMETER CONSTANCY TEST The Critical value used for this test : .01 The minimum group or interval size was: 119 F TEST TO VERIFY CONSTANCY OF PARAMETERS CANDIDATE BREAKPOINT F VALUE P VALUE 120 1999/ 12 4.55639 .0039929423 132 2000/ 12 7.41461 .0000906435 144 2001/ 12 8.56839 .0000199732 156 2002/ 12 9.32945 .0000074149 168 2003/ 12 7.55716 .0000751465 180 2004/ 12 9.19764 .0000087995* * INDICATES THE MOST RECENT SIGNIFICANT BREAK POINT: 1% SIGNIFICANCE LEVEL.

The model built using the more recent data had seasonal and regular differencing, an AR1 and a weak AR12.  Two outliers were found at period 225(9/08) and 247(7/10).  If you look at September's they are typically low, but not in 2008. July's are usually high, but not in 2010.  If you don't identify and adjust for these outliers then you can never achieve a better model.  Here is the Autobox model                                                                                                               

[(1-B**1)][(1-B**12)]Y(T) =                                                                                   
         +[X1(T)][(1-B**1)][(1-B**12)][(-  831.26    )]       :PULSE          2010/  7   247
         +[X2(T)][(1-B**1)][(1-B**12)][(+  613.63    )]       :PULSE          2008/  9   225
        +     [(1+  .302B** 1)(1+  .359B** 12)]**-1  [A(T)]

Alteryx ends up using an extremely convoluted model.  An ARIMA(2,0,2)(0,1,2)[12] with no outliers. That is a whopping 6 parameters vs Autobox's 2 parameters.

Let's take a look at the residuals. It tells everything you need to know.  Period 200 to 235 the model is overfitting the data causing there to be a large are mismodeled. Remember that Autobox found a break at period 180 which is close to period 200. The high negative error(low residual) is the July 2010 outlier that Autobox identifies.  If you ignore outliers they play havoc with your model.

 

 

Here is the table for forecasts for the 12 withheld periods.


 

 

 

 

 

 

 

The M3 Forecasting Competition Calculations were off for Monthly Data

Guess What We Uncovered ? The 2001 M3 Competition's Monthly calculations for SMAPE were off for most of the entries.  How did we find this?  We are very detailed.

 

14 off the 24 to be exact. The accuracy rate was underestimated. Some entries were completely right.  ARARMA was almost off by 2%. Theta-SM was off by almost 1%.  Theta-SM's 1 to 18 SMAPE goes from 14.66 to 15.40.   Holt and also Winter were both off by 1/2%.

 

The underlying data wasn't released for many years so this made this check impossible when this was released.  Does it change the rankings?  Of course. The 1 period out forecast and the averaged of 1 to 18 are the two that I look at.  The averaged rankings had the most disruption. Theta went from 13.85 to 13.94. It's not much of a change.

 

The three other seasonalities accuracies were correctly computed.

 

if you saw our release of Autobox for R, you would know that Autobox would place 2nd for the 1 period out forecast.  You can use our spreadsheet and the forecasts from each of the competitors and prove it yourself.

 

See Autobox's performance in the NN3 competition here.  SAS sponsored the competition, but didn't submit any forecasts.

IBM released version SPSS Modeler 18 recently and with it a 30 day trial version.

We tested it and have more questions than answers. We would be glad to hear any opinions(as always) differing or adding to ours.

There are 2 sets of time series examples included with the 30 day trial.

We went through the first 5 "broadband" examples that come with the trial that are set to run by default.  The 5 examples have no variability and would be categorized as "easy" to model and forecast with no visible outliers. This makes us wonder why there is no challenging data to stress the system here?

For series 4 and 5 both are find to have seasonality.  The online tutorial section called "Examining the data" talks about how Modeler can find the best seasonal models or nonseasonal models.  They then tell you that it will run faster if you know there is no seasonality.  I think this is just trying to avoid bad answers and under the guise of it being "faster". You shouldn't need to prescreen your data.  The tool should be able to identify seasonality or if there is none to be found.  The ACF/PACF statistics helps algorithms(and people) to help identify seasonality.  On the flipside, a user may think there is no seasonality in there data when there actually is so let's take the humans out of the equation.

The broadband example has the raw data and we will use that as we can benchmark it.  If we pretend that the system is a black box and just focused on the forecast, most would visually say that it looks ok, but what happens if we dig deeper and consider the model that was built? Using simple and easy data avoids the difficult process of admitting you might not able complicated data.

The default is to forecast out 3 periods.  Why? With 60 months of data, why not forecast out at least one cycle(12)?  The default is NOT to search and adjust for outliers.  Why? They certainly have many varieties of offerings with respect to outliers, but makes me wonder if they don't like the results?  If you enable outliers only "additive" and "level shift" are used unless you go ahead a click to enable "innovational", "transient", "seasonal additive", "local trends", and "additive patch". Why are these not part of the typical outlier scheme?

When you execute there is no audit trail of how the model go to its result. Why?

You have the option to click on a button to report "residuals"(they call them noise residuals), but they won't generate in the output table for the broadband example.  We like to take the residuals from other tools and run them in autobox.  If a mean model is found then the signal has been extracted from the noise, but if Autobox finds a pattern then the model was insufficient...given Autobox is correct. :)

There is no ability to report out the original ACF/PACF being reported. This skips the first step for any statistician to see and follow why SPSS would select a seasonal model for example 4 and 5.  Why?

There are no summary statistics showing mean or even number of observations. Most statistical tools provide these so that you can be sure the tool is in fact taking in all of the data correctly.

SPSS logs all 5 time series. You can see here how we don't like the kneejerk movement to use logs.

We don't understand why differencing isn't being used by SPSS here. Let's focus on Market 5. Here is a graph and forecast from Autobox 

 

 

Let's assume that logs are necessary(they aren't) and estimate the model using Autobox and auto.arima and both software uses differencing. Why is there no differencing used by SPSS for a non-stationary series? This approach is most unusual. Now, let's walk that back and run Autoboc and NOT use logs and differencing is used with two outliers and a seasonal pulse in the 9th month(and only the 9th month!). So, let's review. SPSS finds seasonality while Autobox & Auto.arima don't.

How did SPSS get there? There is no audit of the model building process. Why?

We don't understand the Y scale on the plots as it has no relationship to the original data or the logged data.

The other time series example is called "catalog forecast". The data is called "men". They skip the "Expert modeler" option and choose "Exponential Smoothing". Why?

This example has some variability and will really show if SPSS can model the data. We aren't going to spend much time with this example. The graph should say it all. Autobox vs SPSS

The ACF/PACF shows a spike at lag 12 which should indicate seasonality. SPSS doesn't identify any seasonality. Autobox also doesn't declare seasonality, but it does identify that October and December's do have seasonality (ie seasonal pulse) so there is some months which are clearly seasonal. Autobox identifies a few outliers and level shift signifying a change in the intercept(ie interpret that as a change in the average).

If we allow the "Expert Modeler", the model identified is a Winter's additive Exponential smoothing model.

We took the SPSS residuals and plotted them. You want random residuals and these are not it. If you mismodel you can actually inject structure and bias into the residuals which are supposed to be random. In this case, the residuals have more seasonality(and two separate trends?) due to the mismodeling then they did with the original data. Autobox found 7 months to be seasonal which is a red flag.

I think we know "why" now.

 

Go to top