SPURIOUS CORRELATION AND IT'S DETECTION

This is an example of what is called Simpson's Paradox. The apparent association is due to the omission of some important information. When the omitted variable is brought into the model the association is cancelled. In time series data it is possible to identify level shifts which may act as a proxy for the omitted variable thus identifying the point in time when the true cause variable changed "state". In the example of house fires, the size of the fire needs to be taken into account --- more firefighters are sent to larger fires and the larger the fires, the worse the damage. In cases like this, i.e. cross-sectional data it is not possible to develop a proxy for the omitted variable as is often the case in time series analysis.

If you consider fire size as categorical (e.g. small, medium, large), the overall effect is that more firemen (seem to) imply more damage; however, within each category of fire, more firemen imply *less* damage. The relationship for every subgroup is the opposite of the relationship for the entire group taken as a whole.

Another example of an overall conclusion not being valid locally has to do with local trends. If you were to fit one trend to this data series you would conclude that there is no trend. This is true overall but not locally.

If you assume a single trend that begins at period 1 and ends at period t, you miss the boat. In our example, it is necessary to include three distinct trends.

Another example of Simpson's Paradox: "Poor people eat better food"

CLICK HERE:Home Page For AUTOBOX