SPURIOUS CORRELATION AND ITS DETECTION

STORKS BRING BABIES


              Ratios are used in many fields to adjust or normalize
         one measure for another in order to make comparisons or
         rankings.  In economics, national indices for wealth are
         formed by the ratio of wealth to population size, examples
         being per capita income and gross national product per
         capita.  In nutrition, the weight of people relative to their
         frame size is captured as body mass index (weight in kg /
         height in meters squared).

              Regression analysis is the standard way to adjust one
         measure for another.  Any observations falling on the
         regression line are thought of as being equal relative to the
         x variable.  Observations above the line are relatively large
         and ones below the line are relatively small.  In regression
         analysis, a constant is included to estimate the value of the
         response variable when the covariate is at zero.  A ratio is
         a special case of regression, equivalent to fitting without a
         constant or forcing the constant to be zero.

If Y= B0 + B1 X then we can write
Z=Y/X
So, analyzing Z is identical to regression analysis where B0=0 by specification, i.e omission.
In human biology, physiology, and nutrition, there is increasing awareness that the use of such ratios can lead to spurious results. A workshop at the April 1996 Experimental Biology meetings was devoted to these concerns, which stem from the implicit assumptions that the relationship between the numerator and denominator of a ratio is a straight line with an intercept of zero. Recent studies have demonstrated that often these linear and zero-intercept assumptions are not met, with the consequence that proper adjustment for the denominator measure is not made. Recent studies have demonstrated that two ratios can appear to be related even when the numerator measures are clearly completely independent. The use of ratios as response variables in regression should be avoided if possible in favor of adjusting for the denominator measure by including it as a covariate in the regression. If ratios are used, one simple way to mitigate these concerns and to ensure that complete adjustment has been made is to include the denominator of the ratios used as a covariate.

Jerzy Neyman used the following example to "prove" that storks bring babies. (u) (v) (w) Population of Storks Number of Babies Born Women (k) County 1 2 10 1 County 2 2 15 1 County 3 2 20 1 County 4 3 10 1 County 5 3 15 1 County 6 3 20 1 County 7 4 10 1 County 8 4 15 1 County 9 4 20 1 County 10 4 15 2 County 11 4 20 2 County 12 4 25 2 County 13 5 15 2 County 14 5 20 2 County 15 5 25 2 County 16 6 15 2 County 17 6 20 2 County 18 6 25 2 County 19 5 20 3 County 20 5 25 3 County 21 5 30 3 County 22 6 20 3 County 23 6 25 3 County 24 6 30 3 County 25 7 22 3 County 26 7 25 3 County 27 7 30 3 County 28 6 25 4 County 29 6 30 4 County 30 6 35 4 County 31 6 25 4 County 32 6 30 4 County 33 6 35 4 County 34 8 30 4 County 35 8 35 4 County 36 8 40 4 County 37 7 30 5 County 38 7 35 5 County 39 7 40 5 County 40 8 30 5 County 41 8 35 5 County 42 8 40 5 County 43 9 30 5 County 44 9 35 5 County 45 9 40 5 County 46 8 35 6 County 47 8 40 6 County 48 8 45 6 County 49 9 35 6 County 50 9 40 6 County 51 9 45 6 County 52 10 35 6 County 53 10 40 6 County 54 10 45 6 We can now compute x = u/w storks per capita y = v/w babies per capita We can now find a statistically significant relationship between x and y. Density of storks Number of Average Class

per 10,000 women counties birth rate average

1.33 3 6.67 7.12 1.40 3 7.00 1.50 6 7.08 1.60 3 7.00 1.67 6 7.50 ------------------------------- 1.75 3 7.50 9.22 1.80 3 7.00 2.00 12 10.21 ------------------------------- 2.33 3 8.33 11.67 2.50 3 10.00 3.00 6 12.50 4.00 3 15.00 The idea here is that one should prefer to use the original data rather than the ratio.

CLICK HERE:Home Page For AUTOBOX