Food Composition Data: A User's Perspective (1987) |

Other considerations |

Consideration of food composition variability: What is the variance of the estimate of one-day intakes? Implications for setting priorities |

**Validation of food intake data: implications of food composition variation**

A particular consideration of the impact of food composition variation on regression and correlation analyses arises in connection with validation trials in which food intake estimated by recall or observation methods is compared with intake during the same period estimated by direct chemical measurement of duplicate meals. This is a common procedure. What may not always be recognized is that variation in food composition will inevitably yield a bias in regression slopes and attenuation of correlation coefficients even if the estimation of food intake is perfect. Further, the impact on regression depends on whether the error term lies in the dependent or independent variable.

**Table 8.** Estimate of variability (CV) of one-day intake derived from consideration of both food composition variation and random error in the estimation of intake (assuming 15 foods in the diet^{a}

CV1 |
CV1 |
|||||||

10 | 15 | 20 | 25 | 30 | 35 | 40 | 45 | |

10 | 3.7 | 4.7 | 5.8 | 7.0 | 8.2 | 9.5 | 10.7 | 12.0 |

15 | 4.7 | 5.5 | 6.5 | 7.4 | 8.7 | 9.9 | 11.1 | 12.4 |

20 | 5.8 | 6.5 | 7.4 | 8.4 | 9.5 | 10.6 | 11.7 | 12.9 |

25 | 7.0 | 7.6 | 8.4 | 9.3 | 10.3 | 11.3 | 12.4 | 13.6 |

30 | 8.2 | 8.7 | 9.5 | 10.3 | 11.2 | 12.2 | 13.3 | 14.4 |

35 | 9.5 | 9.9 | 10.6 | 11.3 | 12.2 | 13.2 | 14.2 | 15.3 |

40 | 10.7 | 11.1 | 11.7 | 12.4 | 13.3 | 14.2 | 15.2 | 16.2 |

45 | 12.0 | 12.4 | 12.9 | 13.6 | 14.4 | 15.3 | 16.2 | 17.2 |

^{a}. Values calculated as value in table 7 divided by square root of 15, the assumed number of food items in the one-day diet.

This can be illustrated in a simulation analysis. Consider a model in which iron intake is computed from estimated food intake and is chemically determined from duplicate meals. Assume that for 97 individuals the intakes range from 5 to 25 mg per day and that the individuals are randomly distributed across this range. Consider also that the iron composition for one-day intakes has a CV of 15 per cent, a value consistent with the estimates presented in table 4. In the simulation analysis, 97 random values for iron intake lying in the 5 to 25 mg range were selected. For each of these mean intake values, a random value was selected from the population of possible real values described by a random distribution having mean as specified and CV = 15 per cent. The regression across the 97 individuals was then computed. This exercise was then repeated de novo 1,000 times. Finally the regression parameters for the 1,000 estimates were examined. The results are presented below:

A. With

X = calculated intake (no error term in this model)

Y = chemically determined intake (includes the "error" term)

Regression parameters | |

Intercept | 0.0639 ± 0.5440^{a} |

Slope | 0.995 ± 0.0441 |

Correlation coefficient | 0.9229 ± 0.0146 |

B. With

X = chemical composition (includes the "error" term)

Y = calculated intake (no error term in this model)

Regression parameters | |

Intercept | 2.1579 ± 0.5403 |

Slope | 0.8573 ± 0.0379 |

Correlation coefficient | 0.9229 ± 0.0146 |

a. Mean ± SD from 1,000 iterations of model.

That the bias in regression slope is seen only in one variation of the model is a recognized phenomenon. It is the error term in the independent variable that biases the regression.

**Table 9.** The impact of error in the independent and dependent variables on (A) regression slope and (B) correlation coefficient^{a}

A. Impact on regression slope

Dependent variable (Y) variance ratio |
Independent variable (X) ratio of intra/inter variances |
|||||||

0 | 0.4 | 0.8 | 1.2 | 1.6 | 2.0 | 2.4 | 2.8 | |

0 | 1.0 | 0.714 | 0.556 | 0.455 | 0.385 | 0.333 | 0.294 | 0.263 |

0.4 | 1.0 | 0.714 | 0.556 | 0.455 | 0.385 | 0.333 | 0.294 | 0.263 |

0.8 | 1.0 | 0.714 | 0.556 | 0.455 | 0.385 | 0.333 | 0.294 | 0.263 |

1.2 | 1.0 | 0.714 | 0.556 | 0.455 | 0.385 | 0.333 | 0.294 | 0.263 |

1.6 | 1.0 | 0.714 | 0.556 | 0.455 | 0.385 | 0.333 | 0.294 | 0.263 |

2.0 | 1.0 | 0.714 | 0.556 | 0.455 | 0.385 | 0.333 | 0.294 | 0.263 |

2.4 | 1.0 | 0.714 | 0.556 | 0.455 | 0.385 | 0.333 | 0.294 | 0.263 |

2.8 | 1.0 | 0.714 | 0.556 | 0.455 | 0.385 | 0.333 | 0.294 | 0.263 |

B. Impact on correlation coefficient^{b}

Dependent variable (Y) variance ratio |
Independent variable (X) ratio of intra/inter variances |
|||||||

0 | 0.4 | 0.8 | 1.2 | 1.6 | 2.0 | 2.4 | 2.8 | |

0 | 1.0 | 0.845 | 0.745 | 0.674 | 0.620 | 0.577 | 0.542 | 0.512 |

0.4 | 0.845 | 0.714 | 0.630 | 0.570 | 0.524 | 0.488 | 0.458 | 0.434 |

0.8 | 0.745 | 0.630 | 0.555 | 0.503 | 0.462 | 0.430 | 0.404 | 0.382 |

1.2 | 0.674 | 0.569 | 0.503 | 0.455 | 0.418 | 0.389 | 0.365 | 0.346 |

1.6 | 0.620 | 0.524 | 0.462 | 0.418 | 0.385 | 0.358 | 0.336 | 0.318 |

2.0 | 0.577 | 0.488 | 0.430 | 0.389 | 0.358 | 0.333 | 0.313 | 0.296 |

2.4 | 0.542 | 0.458 | 0.404 | 0.366 | 0.336 | 0.313 | 0.294 | 0.278 |

2.8 | 0.513 | 0.434 | 0.382 | 0.346 | 0.318 | 0.296 | 0.278 | 0.263 |

a. The reference slope and correlation are each set at 1.0. The tables portray the bias introduced as a multiple of true values.

b. All calculations presented assume that there is no correlation between errors.

A random error in the dependent variable has no specific effect on the slope. In the case of correlation analyses, the effect is the same no matter where the error lies. Here the correlations are high since the range of observed intakes is quite large in relation to the error term stipulated. More general models of these effects on regression slopes and on correlation coefficients are presented in table 9. The bases of these calculations will be found in Beaton et al. [2] and Snedecor and Cochrane [7].