Cover Image
close this book Food Composition Data: A User's Perspective (1987)
close this folder Experiences with food composition data: the context
close this folder Data: the user context
View the document (introductory text)
View the document Introduction
View the document The link between the user and the data
View the document The variability of the data

The variability of the data

The variability of the data

By the "goodness' of the data we mean suitability for the purpose at hand - how well the data will permit us to get on with whatever task we are involved with. For most users, finding a value in a table is sufficient. They will then use it as they need, assuming that that number is the best available estimate of some specific nutrient in some specific food.

Table 1. Whole chicken egg, fresh, raw, per 100 g edible portion

Table Water
USDA (new) 74.57 158 12.14 11.15 1.2 56 2.09
USDA (old) 73.7 163 12.9 11.5 0.9 54 2.3
United Kingdom 74.8 147 12.3 10.9 Trace 52 2
Federal Republic of Germany 74.1 167.08 12.9 11.2 0.7 56 2.1
Sweden 74.4 150 12.7 9.4 2.7 51 2.1
Denmark 74.6 155.8 12.1 11.2 1.2 2  
Finland 74 145 12.8 11.7 0.5 2.5  
Norway 75 155 13 11 0.7 2.1  
Italy 73.9 156 13 11.1 1 50 2.5
East Asia 73.7 163 12.9 11.5 0.8 61 3.2
China 1 71 170 14.7 11.6 1.6 55 2.7
China 2 70.8 187 11.8 15 1.3 58 4.3
China 3 73 174 13.1 13.5 3.6    
China 4 70 175 15.3 11.9 1.6 64 0
China 5 73 160 12.7 11.3 2 55 2.8
Republic of Korea 74 160 12.7 12.1 1.2    
Japan 70.7 199 12.2 15.2 0.9 65 1.8
Malaysia 73.2 166 13.3 12.5 0 57 3
India 73.7 173 13.3 13.3 60 2.1  
Africa 77 140 11.8 9.6 0.6 45 2.6
Near East 72.8 160 12.1 11.4 1.2 55 2.9
INCAP 75.3 148 11.3 9.8 2.7 54 2.5
Brazil   163 12.9 11.5 0.8 61 3.7
Australia   160 12.6 11.6 0.8 54 2.4

a. Data are taken from standard national and regional tables.

If the user were so confused that he consulted an "expert," that expert might give either or both of two answers: (a) "the values in the different tables are really measurements of different objects, often by different methods," or (b) "the differences do not matter." While the first answer is probably true, the second is often false - in general it does matter. For example:

1. A person on a specific diet will receive differing advice depending on what data base is used to analyse his/her food intake.

2. A small difference in an individual diet can become a large difference when projected to a population estimate - the level at which important decisions such as resource allocation are made (table 1).

3. In general, apparent and unexplained inconsistencies reduce the confidence in all such data, in the system which provides such data, and in the science that works with such data.

Table 2. Calcium (mg) in 100 g milk

USDA (8.1) Mean = 119 SE = 0.251; N = 1,054 (SD = 8.15)
McCance and Widdowson Mean = 120 Range = (110-130)
Souci/Fachmann/Kraut Average = 120 Variation = (107-133)
Swedish NFA Average = 113 variation = (100-122.1)

The problem hinges on the fact that it is unlikely that these data really are inconsistent

- they only look inconsistent. Most data bases present a single value for a specific nutrient in a specific food. This leads back to the first answer above

- that different things were measured. If we look closely at the "food" component of figure 1 we note that it should be expanded as in figure 2, to show that every sample of a food is quite likely to differ from every other sample, and this is before the chemists take over and add their own variability.

While it is important to realize that there are a number of specific sources of food composition data variability, the major point is that few tables even hint that such variabilities exist. Moreover, those tables that do, such as those shown in table 2, do not do so in a consistent fashion, nor are they very helpful about just how to use this added information.

The point to be stressed is that there is not a single food of each kind - there is no Platonic ideal "egg." Foods are not mathematical ideals, but must be considered as probabilistic or statistical objects, with statistical distributions of their nutrients. Any compilation or use of food composition data must be firmly based in this fact.

Fig. 2.

We are faced with the fact that a data base needs to contain more than just a single value for each food-nutrient combination. The description of a distribution is not straightforward; few distributions can be described adequately with just a few numbers. (The obvious counterexample is the Normal or Guassian distribution, the familiar bell-shaped curve. However, few measurements follow this distribution precisely.) The "statistics" that can be used to describe an arbitrary distribution include the mean, mode, median, quartiles, percentiles, standard deviations, and mean deviations. Each has its adherents and rationale. The improvement of food composition data requires careful investigation of both where the data come from and what they are to be used for. Inherent in the viewing of food composition data as "data" are several implications that need to be stressed.

1. Each type of user is likely to require different statistics. These need to be carefully defined and justified. For example, someone wanting to estimate intake would perhaps be satisfied with a mean or median value, while someone worrying about meeting requirements would want an upper or lower limit. In order for data banks to be well designed, for them to include "good" data, each user must decide the best data representation for his/her specific application.

2. Data banks must be designed to provide information about the variabilities of their holdings. Ultimately this requires access to raw data, but, well before that, standardized and documented algorithms for data manipulation are needed.

3. Users must be made aware of the inherent variabilities of the data, of the magnitudes and implications of these variabilities, and of the procedures for handling this inescapable aspect of food composition data.

4. The sources of data variabilities need to be sorted out as a preliminary step, estimating their magnitude, exploring their importance, and reducing those that can be reduced by approaches ranging from standardizing analytic techniques to developing regional values.