Cover Image
close this book Food Composition Data: A User's Perspective (1987)
close this folder Other considerations
close this folder Consideration of food composition variability: What is the variance of the estimate of one-day intakes? Implications for setting priorities
View the document (introductory text)
View the document Introduction
View the document Magnitude of the reported variability of composition
View the document Impact of composition variation on a one-day food intake
View the document Additional impact of a random error in intake estimation
View the document Some implications for data analyses
View the document Validation of food intake data: implications of food composition variation
View the document Systematic errors in food composition data
View the document Relevance to priorities for food composition data
View the document Conclusions
View the document References

Validation of food intake data: implications of food composition variation

Validation of food intake data: implications of food composition variation

A particular consideration of the impact of food composition variation on regression and correlation analyses arises in connection with validation trials in which food intake estimated by recall or observation methods is compared with intake during the same period estimated by direct chemical measurement of duplicate meals. This is a common procedure. What may not always be recognized is that variation in food composition will inevitably yield a bias in regression slopes and attenuation of correlation coefficients even if the estimation of food intake is perfect. Further, the impact on regression depends on whether the error term lies in the dependent or independent variable.

Table 8. Estimate of variability (CV) of one-day intake derived from consideration of both food composition variation and random error in the estimation of intake (assuming 15 foods in the dieta

CV1

CV1

  10 15 20 25 30 35 40 45
10 3.7 4.7 5.8 7.0 8.2 9.5 10.7 12.0
15 4.7 5.5 6.5 7.4 8.7 9.9 11.1 12.4
20 5.8 6.5 7.4 8.4 9.5 10.6 11.7 12.9
25 7.0 7.6 8.4 9.3 10.3 11.3 12.4 13.6
30 8.2 8.7 9.5 10.3 11.2 12.2 13.3 14.4
35 9.5 9.9 10.6 11.3 12.2 13.2 14.2 15.3
40 10.7 11.1 11.7 12.4 13.3 14.2 15.2 16.2
45 12.0 12.4 12.9 13.6 14.4 15.3 16.2 17.2

a. Values calculated as value in table 7 divided by square root of 15, the assumed number of food items in the one-day diet.

This can be illustrated in a simulation analysis. Consider a model in which iron intake is computed from estimated food intake and is chemically determined from duplicate meals. Assume that for 97 individuals the intakes range from 5 to 25 mg per day and that the individuals are randomly distributed across this range. Consider also that the iron composition for one-day intakes has a CV of 15 per cent, a value consistent with the estimates presented in table 4. In the simulation analysis, 97 random values for iron intake lying in the 5 to 25 mg range were selected. For each of these mean intake values, a random value was selected from the population of possible real values described by a random distribution having mean as specified and CV = 15 per cent. The regression across the 97 individuals was then computed. This exercise was then repeated de novo 1,000 times. Finally the regression parameters for the 1,000 estimates were examined. The results are presented below:

A. With
X = calculated intake (no error term in this model)
Y = chemically determined intake (includes the "error" term)

Regression parameters  
Intercept 0.0639 ± 0.5440a
Slope 0.995 ± 0.0441
Correlation coefficient 0.9229 ± 0.0146

B. With
X = chemical composition (includes the "error" term)
Y = calculated intake (no error term in this model)

Regression parameters  
Intercept 2.1579 ± 0.5403
Slope 0.8573 ± 0.0379
Correlation coefficient 0.9229 ± 0.0146

a. Mean ± SD from 1,000 iterations of model.

That the bias in regression slope is seen only in one variation of the model is a recognized phenomenon. It is the error term in the independent variable that biases the regression.

Table 9. The impact of error in the independent and dependent variables on (A) regression slope and (B) correlation coefficienta

A. Impact on regression slope

Dependent variable (Y) variance ratio

Independent variable (X) ratio of intra/inter variances

  0 0.4 0.8 1.2 1.6 2.0 2.4 2.8
0 1.0 0.714 0.556 0.455 0.385 0.333 0.294 0.263
0.4 1.0 0.714 0.556 0.455 0.385 0.333 0.294 0.263
0.8 1.0 0.714 0.556 0.455 0.385 0.333 0.294 0.263
1.2 1.0 0.714 0.556 0.455 0.385 0.333 0.294 0.263
1.6 1.0 0.714 0.556 0.455 0.385 0.333 0.294 0.263
2.0 1.0 0.714 0.556 0.455 0.385 0.333 0.294 0.263
2.4 1.0 0.714 0.556 0.455 0.385 0.333 0.294 0.263
2.8 1.0 0.714 0.556 0.455 0.385 0.333 0.294 0.263

B. Impact on correlation coefficientb

Dependent variable (Y) variance ratio

Independent variable (X) ratio of intra/inter variances

  0 0.4 0.8 1.2 1.6 2.0 2.4 2.8
0 1.0 0.845 0.745 0.674 0.620 0.577 0.542 0.512
0.4 0.845 0.714 0.630 0.570 0.524 0.488 0.458 0.434
0.8 0.745 0.630 0.555 0.503 0.462 0.430 0.404 0.382
1.2 0.674 0.569 0.503 0.455 0.418 0.389 0.365 0.346
1.6 0.620 0.524 0.462 0.418 0.385 0.358 0.336 0.318
2.0 0.577 0.488 0.430 0.389 0.358 0.333 0.313 0.296
2.4 0.542 0.458 0.404 0.366 0.336 0.313 0.294 0.278
2.8 0.513 0.434 0.382 0.346 0.318 0.296 0.278 0.263

a. The reference slope and correlation are each set at 1.0. The tables portray the bias introduced as a multiple of true values.
b. All calculations presented assume that there is no correlation between errors.

A random error in the dependent variable has no specific effect on the slope. In the case of correlation analyses, the effect is the same no matter where the error lies. Here the correlations are high since the range of observed intakes is quite large in relation to the error term stipulated. More general models of these effects on regression slopes and on correlation coefficients are presented in table 9. The bases of these calculations will be found in Beaton et al. [2] and Snedecor and Cochrane [7].