Goodman-Kruskal Index of Predictive Association


If you have visited this page before and wish to skip the preamble, click here to go directly to the calculator.

This page will calculate lambda, the Goodman-Kruskal index of predictive association, for a contingency table containing up to 5 rows and 5 columns.

To illustrate the meaning of lambda, suppose you had a total of n=137 instances of X sorted into three categories of A, with the following counts and proportions:
Count
Proportion
A1
53
.3869
A2
39
.2847
A3
45
.3285

If you were now to predict which of these categories some new instance of X might fall into, your best guess would be A1, since that is the category with the largest number of items in the observed sample. In this case, assuming the proportions within the sample to be an unbiased reflection of the general population of X, the estimated probability of a correct prediction would be .3869 and the estimated probability of an error would be 1-.3869=.6131.

Now go back to the beginning and suppose that these same n=137 items had been concurrently sorted into three categories of B, with the following results cross- tabulated:
B1
B2
B3
Total
A1
19
26
8
53
A2
21
13
5
39
A3
6
12
27
45
Total
46
51
40
137

In this circumstance your predictions could be more refined. Knowing a new instance of X to belong to category B1, your best bet for its membership in A would A2; knowing it to belong to B2, your best bet would be A1; and knowing it to belong to B3, the best bet would be A3. Now the estimated probability of a correct prediction would be (21+26+27)/137=.5401, and the estimated probability of an error would be 1-.5401=.4599. Without knowledge of the relationship between A and B, the probability of prediction error is .6131. With that knowledge, it reduces to .4599. The Goodman- Kruskal index is simply a measure of the proportion by which prediction error is reduced in such situations. In the present example, for the case of predicting A on the basis or B, it is
lambda[A from B] = (.6131-.4599)/.6131 = .25

As indicated by the [A from B] subscript in the above formulation, lambda is asymmetric. If you were turn things around so as to make categorical predictions for B, your best bet in the absence of information about A would be B2, since that is the B category with the largest number of instances. The initial estimated probability of error in this case would be (137-51)/137=.6277. Once you factor in the relationship between A and B, you could refine your guesses by predicting B2 when X is A1; B1 when X is A2; and B3 when X is A3. The estimated probability of correct prediction would now be (26+21+27)/137=.5401, the estimated probability of error would be 1-.5401=.4599, and the proportionate reduction in prediction error would accordingly be
lambda[B from A] = (.6277-.4599)/.6277 = .2673

This page will calculate the two versions of lambda, [A from B] and [B from A], along with the estimated probabilities of correct prediction for [A alone], [A from B], [B alone], and [B from A]. It will also perform a chi-square analysis on the contingency table. ~~ To begin, enter the numbers of rows and columns in the designated places, then click the «Setup» button and enter your data into the appropriate cells of the data-entry matrix. After all data have been entered, click the «Calculate» button.
Number of rows =
    
   Number of columns =

Data Entry
B1
B2
B3
B4
B5
Totals
A1
A2
A3
A4
A5
Totals

   

Chi-Square
df
P




[Note that for df=1, the calculated value
of chi-square is corrected for continuity.]

Lambda for predicting
Standard
Error

.95 CI Limits
Lower
Upper
A from B:




B from A:





Estimated Probability of Correct Prediction
when Predicting:
A alone
  
A from B
  
B alone
  
B from A
  

Home Click this link only if you did not arrive here via the VassarStats main page.



©Richard Lowry 2001
All rights reserved.