| Food Composition Data: A User's Perspective (1987) |
|Experiences with food composition data: the context|
|Data: the user context|
The variability of the data
By the "goodness' of the data we mean suitability for the purpose at hand - how well the data will permit us to get on with whatever task we are involved with. For most users, finding a value in a table is sufficient. They will then use it as they need, assuming that that number is the best available estimate of some specific nutrient in some specific food.
Table 1. Whole chicken egg, fresh, raw, per 100 g edible portion
|Federal Republic of Germany||74.1||167.08||12.9||11.2||0.7||56||2.1|
|Republic of Korea||74||160||12.7||12.1||1.2|
a. Data are taken from standard national and regional tables.
If the user were so confused that he consulted an "expert," that expert might give either or both of two answers: (a) "the values in the different tables are really measurements of different objects, often by different methods," or (b) "the differences do not matter." While the first answer is probably true, the second is often false - in general it does matter. For example:
1. A person on a specific diet will receive differing advice depending on what data base is used to analyse his/her food intake.
2. A small difference in an individual diet can become a large difference when projected to a population estimate - the level at which important decisions such as resource allocation are made (table 1).
3. In general, apparent and unexplained inconsistencies reduce the confidence in all such data, in the system which provides such data, and in the science that works with such data.
Table 2. Calcium (mg) in 100 g milk
|USDA (8.1)||Mean = 119||SE = 0.251; N = 1,054 (SD = 8.15)|
|McCance and Widdowson||Mean = 120||Range = (110-130)|
|Souci/Fachmann/Kraut||Average = 120||Variation = (107-133)|
|Swedish NFA||Average = 113||variation = (100-122.1)|
The problem hinges on the fact that it is unlikely that these data really are inconsistent
- they only look inconsistent. Most data bases present a single value for a specific nutrient in a specific food. This leads back to the first answer above
- that different things were measured. If we look closely at the "food" component of figure 1 we note that it should be expanded as in figure 2, to show that every sample of a food is quite likely to differ from every other sample, and this is before the chemists take over and add their own variability.
While it is important to realize that there are a number of specific sources of food composition data variability, the major point is that few tables even hint that such variabilities exist. Moreover, those tables that do, such as those shown in table 2, do not do so in a consistent fashion, nor are they very helpful about just how to use this added information.
The point to be stressed is that there is not a single food of each kind - there is no Platonic ideal "egg." Foods are not mathematical ideals, but must be considered as probabilistic or statistical objects, with statistical distributions of their nutrients. Any compilation or use of food composition data must be firmly based in this fact.
We are faced with the fact that a data base needs to contain more than just a single value for each food-nutrient combination. The description of a distribution is not straightforward; few distributions can be described adequately with just a few numbers. (The obvious counterexample is the Normal or Guassian distribution, the familiar bell-shaped curve. However, few measurements follow this distribution precisely.) The "statistics" that can be used to describe an arbitrary distribution include the mean, mode, median, quartiles, percentiles, standard deviations, and mean deviations. Each has its adherents and rationale. The improvement of food composition data requires careful investigation of both where the data come from and what they are to be used for. Inherent in the viewing of food composition data as "data" are several implications that need to be stressed.
1. Each type of user is likely to require different statistics. These need to be carefully defined and justified. For example, someone wanting to estimate intake would perhaps be satisfied with a mean or median value, while someone worrying about meeting requirements would want an upper or lower limit. In order for data banks to be well designed, for them to include "good" data, each user must decide the best data representation for his/her specific application.
2. Data banks must be designed to provide information about the variabilities of their holdings. Ultimately this requires access to raw data, but, well before that, standardized and documented algorithms for data manipulation are needed.
3. Users must be made aware of the inherent variabilities of the data, of the magnitudes and implications of these variabilities, and of the procedures for handling this inescapable aspect of food composition data.
4. The sources of data variabilities need to be sorted out as a preliminary step, estimating their magnitude, exploring their importance, and reducing those that can be reduced by approaches ranging from standardizing analytic techniques to developing regional values.