Waging a war against how to model time series vs fitting
Everyone is asked this question.
Everyone tries to answer it.
Everyone wants the "Early Warning" and quarter end numbers for "the street". Finance or Accounting are often asked by the organization to make these projections, but the truth is (and hurts) that they don't have the time series analysis tools and awareness needed to provide a good forecast.
The question is "Are you using your modeling skills to help you answer that question" or are you using simple averaging and ratio estimates to figure this one out?
We were asked at a Forecasting Conference by a P&G employee how to do this. They had NOT identified the change in level down due to the financial crises in 2008. Why? They were using simple approaches to forecast to a not so simple problem. You need to model the data to do it correctly and not just blindly apply back of the envelope methods.
The problem is that when you use simple methods like ratio estimates you are assuming that Wednesday's behave the same as Saturdays. That assumption is the problem. Less likely, but often we see the first few days of the month stronger due to a paycheck effect. So, the first few days are an outlier and yet they are being used to project. The wrong way to forecast the month end number is like this:
We are 5 days into the month and we want to forecast the expected month end number. We take the first 5 days revenue/sales(ie $47,000). We take the number of days in a month (ie 30) and take 30 and divide it by 5 which results in a factor of 6. We multiply 6 by the 5 days of revenue and that is the forecast. $47,000*6=$282,000.
The report you should be getting from your statistical software should look like this. It provides a table of forecasts and the probability of making different numbers. This allows you to interpolate your target number to find out where you stand using statistics and not simplistic division and multiplication. Different levels of statistical confidence are set and the history plus the forecast are aggregated to provide a month end number. The expected forecast is the 50% confidence level.
PROBABILITY FORECAST
99.862335 17521950.24
99.740551 17804768.87
99.528383 18087587.5
99.172964 18370406.13
98.600522 18653224.75
97.714171 18936043.38
96.394897 19218862.01
94.507447 19501680.64
91.911957 19784499.27
88.481731 20067317.9
84.124939 20350136.53
78.806855 20632955.15
72.569247 20915773.78
65.538755 21198592.41
57.924343 21481411.04
57.924343 21481411.04
50 21764229.67
42.075657 22047048.3
34.461245 22329866.93
27.430753 22612685.56
21.193145 22895504.18
15.875062 23178322.81
11.518269 23461141.44
8.088043 23743960.07
5.492553 24026778.7
3.605103 24309597.33
2.285829 24592415.96
1.399478 24875234.59
0.827036 25158053.21
0.471617 25440871.84
0.259449 25723690.47
0.137665 26006509.1
To do this properly, you need to model the data using daily data. We recommend about 3 years of historical data. You need to consider all of these issues:
Day of the week effects, Seasonality, Trends, Holiday effects, Day of the month effects, outliers, Lead/Lags around Holidays, changes in seasonality(ie Saturdays are high and then over time becomes like the other days of the week)
Once you build a model that is sophisticated to truly model and understand the patterns within the data, you can then forecast with a reduced level of bias. Let's describe SOME of the different components in this model now
1)The average demand is 148k 2)Demand is low by 241k on Women's Day - note that this is data from South Africa 3)The day before Reconciliation Day is high by 113k 4)MONTH_EFF09 means Septembers are typically low by 11k 5)FIXED_EFF_N10107 means that the first day of the week is low by 164k Note that the data begins on 7/1/2007 which is a Sunday 6)WKINM01 means that the first week of the month is high by 64k 7)FIXED_DAY01 means that the first day of the month is low by 56k 8)SEASONAL PULSE beginning at 164/ 2 8/16/2010 means that volume AT or ABOUT on Mondays became higher by 152k starting on 8/16/2010 9)LEVEL SHIFT beginning at 7/30/2010 found overall volume to be higher by 21k. 10)PULSE outlier found on 11/1/2010 was low by 368k. The 41 outliers are reported by importance statistically.
Series __07010796RRAE
Y(T) = .14855E+06 +[X1(T)][(- .24148E+06)] G_WOMEN +[X2(T)][(- .18020E+06)] G_HERITAGE +[X3(T)][(+ .11308E+06B**-1- .25778E+06+ .18230E+06B** 1 + .23546E+06B** 2+ .19179E+06B** 3+ .18266E+06B** 5)] G_RECONCILE +[X4(T)][(+ .16671E+06B**-3+ .17821E+06B**-2+ .15050E+06B**-1 - .22621E+06- .19222E+06B** 1)] M_XMAS +[X5(T)][(+ .31591E+06B**-4+ .17758E+06B**-3+ 64947. B**-2 - .21378E+06)] M_NEWYEARS +[X6(T)][(- 72258. B**-3- .20193E+06B**-2+ 87829. B**-1 - .20174E+06B** 1)] M_EASTER +[X7(T)][(- .23159E+06+ 94924. B** 1)] G_FREEDOM +[X8(T)][(+ 90703. B**-2+ 49047. B**-1- .18390E+06)] G_WORKERS +[X9(T)][(- .23416E+06+ 51151. B** 1+ 76470. B** 2)] G_YOUTH +[X10(T)[(- 11845. )] MONTH_EFF09 +[X11(T)[(- 21861. )] MONTH_EFF10 +[X12(T)[(- 17001. )] MONTH_EFF11 +[X13(T)[(- 20620. )] MONTH_EFF01 +[X14(T)[(- 29782. )] MONTH_EFF02 +[X15(T)[(- 13982. )] MONTH_EFF03 +[X16(T)[(- 11262. )] MONTH_EFF04 +[X17(T)[(- .16496E+06)] FIXED_EFF_N10107 +[X18(T)[(+ .13591E+06)] FIXED_EFF_N10307 +[X19(T)[(+ 87981. )] FIXED_EFF_N10407 +[X20(T)[(+ 38655. )] FIXED_EFF_N10507 +[X21(T)[(+ 25474. )] FIXED_EFF_N10607 +[X22(T)[(+ 64857. )] WKINM01 +[X23(T)[(- 56641. )] FIXED_DAY01 +[X24(T)[(+ 46855. )] FIXED_DAY08 +[X25(T)[(+ 44857. )] FIXED_DAY09 +[X26(T)[(+ 22607. )] FIXED_DAY10 +[X27(T)[(+ 17102. )] FIXED_DAY17 +[X28(T)[(+ .15233E+06)] :SEASONAL PULSE 1143 164/ 2 8/16/2010 I~S01143__07010796RRAE +[X29(T)[(+ 21070. )] :LEVEL SHIFT 1126 161/ 6 7/30/2010 I~L01126__07010796RRAE +[X30(T)[(- .36860E+06)] :PULSE 1220 175/ 2 11/ 1/2010 I~P01220__07010796RRAE +[X31(T)[(- .37728E+06)] :PULSE 170 25/ 2 12/17/2007 I~P00170__07010796RRAE +[X32(T)[(- 32079. )] :LEVEL SHIFT 417 60/ 4 8/20/2008 I~L00417__07010796RRAE +[X33(T)[(- .29897E+06)] :PULSE 1067 153/ 3 6/ 1/2010 I~P01067__07010796RRAE +[X34(T)[(+ 29329. )] :LEVEL SHIFT 139 20/ 6 11/16/2007 I~L00139__07010796RRAE +[X35(T)[(- .28982E+06)] :PULSE 1129 162/ 2 8/ 2/2010 I~P01129__07010796RRAE +[X36(T)[(+ .19134E+06)] :PULSE 536 77/ 4 12/17/2008 I~P00536__07010796RRAE +[X37(T)[(- .26880E+06)] :PULSE 1038 149/ 2 5/ 3/2010 I~P01038__07010796RRAE +[X38(T)[(- .22253E+06)] :PULSE 662 95/ 4 4/22/2009 I~P00662__07010796RRAE +[X39(T)[(- .26024E+06)] :PULSE 1159 166/ 4 9/ 1/2010 I~P01159__07010796RRAE +[X40(T)[(- .29674E+06)] :PULSE 547 79/ 1 12/28/2008 I~P00547__07010796RRAE +[X41(T)[(- .30180E+06)] :PULSE 171 25/ 3 12/18/2007 I~P00171__07010796RRAE +[X42(T)[(- .26154E+06)] :PULSE 303 44/ 2 4/28/2008 I~P00303__07010796RRAE +[X43(T)[(+ .17179E+06)] :PULSE 308 44/ 7 5/ 3/2008 I~P00308__07010796RRAE +[X44(T)[(+ .21111E+06)] :PULSE 180 26/ 5 12/27/2007 I~P00180__07010796RRAE +[X45(T)[(+ .24487E+06)] :PULSE 169 25/ 1 12/16/2007 I~P00169__07010796RRAE +[X46(T)[(- .20171E+06)] :PULSE 1097 157/ 5 7/ 1/2010 I~P01097__07010796RRAE +[X47(T)[(+ .28926E+06)] :PULSE 173 25/ 5 12/20/2007 I~P00173__07010796RRAE +[X48(T)[(+ .18024E+06)] :PULSE 1139 163/ 5 8/12/2010 I~P01139__07010796RRAE +[X49(T)[(- .21759E+06)] :PULSE 772 111/ 2 8/10/2009 I~P00772__07010796RRAE +[X50(T)[(+ .22993E+06)] :PULSE 302 44/ 1 4/27/2008 I~P00302__07010796RRAE +[X51(T)[(- .26803E+06)] :PULSE 307 44/ 6 5/ 2/2008 I~P00307__07010796RRAE +[X52(T)[(- 15373. )] :SEASONAL PULSE 833 119/ 7 10/10/2009 I~S00833__07010796RRAE +[X53(T)[(- .18772E+06)] :PULSE 1189 170/ 6 10/ 1/2010 I~P01189__07010796RRAE +[X54(T)[(+ .21572E+06)] :PULSE 771 111/ 1 8/ 9/2009 I~P00771__07010796RRAE +[X55(T)[(+ 9024.5 )] :LEVEL SHIFT 901 129/ 5 12/17/2009 I~L00901__07010796RRAE +[X56(T)[(+ 97577. )] :PULSE 367 53/ 3 7/ 1/2008 I~P00367__07010796RRAE +[X57(T)[(+ .17982E+06)] :PULSE 182 26/ 7 12/29/2007 I~P00182__07010796RRAE +[X58(T)[(+ .12688E+06)] :PULSE 312 45/ 4 5/ 7/2008 I~P00312__07010796RRAE +[X59(T)[(+ .13894E+06)] :PULSE 313 45/ 5 5/ 8/2008 I~P00313__07010796RRAE +[X60(T)[(+ .14256E+06)] :PULSE 1040 149/ 4 5/ 5/2010 I~P01040__07010796RRAE +[X61(T)[(+ .13974E+06)] :PULSE 408 59/ 2 8/11/2008 I~P00408__07010796RRAE +[X62(T)[(- .19002E+06)] :PULSE 996 143/ 2 3/22/2010 I~P00996__07010796RRAE +[X63(T)[(+ 49165. )] :SEASONAL PULSE 632 91/ 2 3/23/2009 I~S00632__07010796RRAE +[X64(T)[(+ 29613. )] :SEASONAL PULSE 1088 156/ 3 6/22/2010 I~S01088__07010796RRAE +[X65(T)[(+ .14847E+06)] :PULSE 1013 145/ 5 4/ 8/2010 I~P01013__07010796RRAE +[X66(T)[(+ .12300E+06)] :PULSE 270 39/ 4 3/26/2008 I~P00270__07010796RRAE +[X67(T)[(- .16349E+06)] :PULSE 540 78/ 1 12/21/2008 I~P00540__07010796RRAE +[X68(T)[(- .19112E+06)] :PULSE 176 26/ 1 12/23/2007 I~P00176__07010796RRAE + + [A(T)]
This looks easy to do, right?