|Nutrition Guidelines (MSF, 1995, 191 p.)|
|Part II: Rapid Nutrition Surveys|
Principles of sampling
If all individuals in a given population were surveyed, we would get a precise picture of the nutritional status of this population. An exhaustive survey of this type would be long, costly and difficult to carry out. This is why measurements are only recorded for a sub-group of the population, called a sample, which "represents" the whole population. In fact, only children aged 6 to 59 months (65 -110 cm) are included in the target population, since it is this group which will best reflect the nutritional status of the population. Children in this age group are in a growing period, hence a modification in the availability of food will affect them first. It is from this sub-group that the sample is selected.
If the main objective of the survey is to compare two groups according to their nutritional status, two different surveys, one for each group, are required.
REPRESENTATIVENESS OF THE SAMPLE
The representativeness of a sample is essential. It is the prerequisite for extrapolation of results observed for the sample to the entire population. In order for a sample to be representative of the population, two criteria should be met: each individual should have an equal chance of being selected for the sample, and the selection of one individual should be independent of the selection of another individual.
Whenever a sample is drawn, a probability of error exists, meaning that there is a risk that the sample may not be truly representative of the population. In nutrition surveys, we accept an error risk of 5%. This means that we accept that in 5% of the surveys, results observed for the sample will not reflect the true nutritional status of the population. In other words, whenever an organization carries out 100 nutrition surveys, 5 of them will give a result not reflecting the true situation.
PRECISION, THE CONFIDENCE INTERVAL
By carrying out measures on a sample of the population, we only get an estimation of what the results would be if they were carried out on the entire population. If a second sample is drawn out of the same population, slightly different results may be obtained just because of the variation of the children selected for the samples.
The actual percentage of malnutrition in the entire population lies in a range around the observed value. The upper and lower limit of this range determines the confidence interval of the estimation. For example results will be expressed as follows: malnutrition rate = 13% + 5%, meaning the confidence interval ranges from 8% to 18%. The size of the confidence interval is related to the error risk and the size of the sample.
The sample size is related to three factors:
The expected precision: the greater the precision desired, the more people needed in the sample.
The probability of error chosen: the smaller the probability, the more people needed in the sample. If the whole population is surveyed, the probability is zero. In nutrition surveys, an error risk of 5% is accepted.
The expected prevalence: the nearer the expected proportion of children presenting malnutrition is to 50%, the greater the size of the sample required, for the same absolute precision.
Furthermore, a fourth factor should be taken into consideration:
The available means: the ideal objective in determining the sample size is to have the highest precision for the smallest error risk. The limiting factor is the available means. How many children can reasonably be surveyed in a day? How many data collectors are available?...
In conclusion, measuring malnutrition in a sample gives values affected by a known and accepted margin of error. On the other hand, sampling reduces the workload and allows surveys to be carried out in a short period of time.
Calculation of the sample size
When calculating the size of the sample the three factors previously defined should be taken into consideration. The formula used is the following *:
n = t2 * (p * q) / d2
n = sample size
t = parameter related to the error risk, equals 1.96 or 2 for an error risk of 5%
p = expected prevalence of malnutrition in the population, expressed as a fraction of 1
q = 1 - p, expected proportion of children not presenting malnutrition, expressed as a fraction of 1.
d = absolute precision, expressed as a fraction of 1.
<<t>> is fixed 1.96 (or 2) in this type of survey (corresponding to an error risk of 5% 3).
<<p>> and thus <<q>> (q = 1-p) are estimated from previous surveys. The expected prevalence is always chosen to be closer to 0.5 (50%) than truly expected order to get a bigger sample size. If we have a larger sample size than needed, we are sure of getting at least the desired precision even if the measured prevalence is larger than expected. A short survey of 30 households can give an idea of the expected prevalence if no information is available prior lo the survey
<<d>> is a parameter that can be modified (<<t>> is constant, <<p>> is estimated The factors which are considered in determining <<d>> are: the objectives of the survey the expected prevalence and the available means.
If the main objective of the survey is to demonstrate a moderate difference in the nutritional status between two groups, or over a certain period of time, the precision will have to be high (and therefore, <<d>> very small).
Usually, in nutrition surveys, the expected prevalence ranges from 5% to 20%.
The precision should be proportional to the expected prevalence. For example, 10% precise for an expected prevalence of malnutrition of 10% will give a confidence interval from 0% to 20%. No conclusion can be reached from such results. Refer to the next table in order to see how the precision affects the sample size for digit levels of expected malnutrition.
For example, in a survey where the expected malnutrition rate is 15% (12% from a previous survey), and with a desired precision of 3%, the sample size is:
n = 1.962 * 0.15 * 0.85 / 0.032 = 544
The size of the target population does not usually effect the required size of the sample. This is true when the size of the population is much larger than the size of the sample. However, if the sample size approaches the size of the population, a correction factor can be applied to the formula. It reduces the required sample size needed to get the chosen precision. This correction factor is used whenever the sample size is more than one tenth of the total population. The revised sample size is given by the following formula:
Revised n = n/(1 + (n/N))
In our example, if the total population of children aged 6 to 59 months was 5000, the revised sample size would be:
Revised n = 544 / (1 + 544/5000) = 490
Three main sampling methods are available: random sampling, systematic sampling d cluster sampling.
Random sampling is the best method, when it can be used, since it is the only one meeting the two criteria for representativeness as previously defined. A sampling base should be available which lists every individual in the population and allows you to locate them. The list must be kept up to date with regard to the ages and location of each individual and include all new births. Individuals are randomly drawn from the list using a random number table (see Annex 10). Most of the time, such a list is not available or reliable.
Systematic sampling is a method in which the geographical organization of thee area to be surveyed is used. Every household should have the same chance of being surveyed by a team going across the whole area. Then one household out of X is visited. This technique can often been used in well organized refugee camps where houses are arranged in blocks and lines. In the same manner, if houses are enumerated, it is possible to survey one household out of X number, going across the camp from one extremity to the other.
TWO STAGE CLUSTER SAMPLING
This method is used when the two previous ones are not possible: no register is available and the geographical organization of the area does not permit a visit to all houses. The population is grouped in smaller units for which the population sizes can be estimated. The smallest unit for which the population can be estimated should be chosen as the sampling base. These units maybe villages, city blocks or sections of a camp. Thirty dusters' am randomly drawn (first level of sampling), in each cluster a certain number of children will be selected and surveyed (second level of sampling). The chance for each unit to be selected is proportional to its population size.
This sampling technique does not meet the second criteria for representativeness. The fact that several children are selected within a cluster by proximity means that the choice of a child is not independent from the choice of other children. Within each cluster children will have a tendency to be more similar, as far as nutritional status is concerned. This phenomenon is called the <<Design effect>>. The design effect is taken to account when calculating the sample size by multiplying the result obtained through the formula by 2. It means that when cluster sampling is used, the survey should use a sample size twice as large as for the other two sampling methods to reach the same level of precision.
WHICH METHOD TO CHOOSE?
Whenever a reliable register is available, random sampling is preferred. When populations are living in small, well defined geographical areas, systematic sampling should be chosen. In other instances, a two stage cluster sampling strategy should be applied.
Realization of the sampling
Random sampling implies the existence of a sampling base, such as a register. The steps are as follows:
· calculation of the sample size:
The following information is required:
- expected prevalence of malnutrition: for example p = 0.15
- error risk: 5%, meaning t = 1.96
- precision wanted: p = 0.03 (3%)
The sample size is:
n = 1.962 * 0.15 * 0.85 / 0.032 = 544
· a serial number is given to each child. For example, in a population of 12,481 children, a serial number between 00001 and 12,481 is attributed to each child.
· draw numbers from the list using a random number table
until the required
number of children is selected. For example, the table may generate the following random numbers: 00002, 00006, 00013, 00017, 00023,...,11,872,...
Children corresponding to these numbers are included in the sample.
Systematic sampling is used in relatively small geographical areas. The draw is based on a register of families or on the spatial arrangement of households. The organization of the site should allow one to comprehensively cover all houses. This technique is particularly adapted to well organized refugee camps. The steps for systematic sampling are as follows:
· Determine the number of inhabitants and the number of households: For example let's consider a camp of 50,000 refugees and 11,000 households.
· Determine the number of children between 6 and 59 months of age
The proportion of children between 6 and 59 months is quite stable, usually around 20%. However in certain situations, when a high infanto-juvenile mortality is suspected, this proportion can be smaller. The proportion of children has to be estimated from a rapid survey covering about 30 households selected at random.
In our example we have an estimate of 10,000 children (20% of 50,000).
· Calculation of the sample size
The same calculation as for random sampling is used: n = 544.
· Determine the required number of households
The first step is to calculate the average number of children by household. It is equal to the total number of children divided by the number of households: 10,000/11,000 = 0.9. Therefore, 604 households (544/0.9) will have to be visited in order to complete the sample.
· Determine the sampling interval
This is calculated by dividing the number of households by the number of households required in the sample. In our example: 11,000 / 604 = 18.2. One household every 18 households will be visited and all children (between 6 and 59 months) found in these households are included in the sample.
- Determine the first household to visit
The first household is randomly selected in the first interval, 01 to 18, using a random number, 05 for example.
· Selection of the households
One household is then selected, starting with the fifth one, then the twenty-third (18 + 5), the forty-first, etc.....
If two eligible children are found in a household, both are included in the sample. If no children are found in one household, the closest household (or as found using the sampling interval) is visited. If a child is not present at the time of the visit, the data collectors will have to come back to this very household in order to measure the child.
It is important not to overestimate the proportion of children aged 6 to 59 month when calculating the sampling interval. If this were the case, the sampling interval would be too large and the sample would not reach the desired size.
The steps for 2 stage cluster sampling are as follows:
· Determine the geographical units and their population
Cluster sampling requires the grouping of the population in smaller geographical units The smallest available geographical unit is always chosen as long as its population can be estimated. For each of these units the population of children 6 to 59 months is estimated.
These units can be villages, sections of the camp, or naturally defined geographical areas (river, road,...). In the rest of our example we will refer to these geographical units as sections.
· Calculation of the sample size
The calculation of the sample size uses the same formula as for random or systematic sampling. However, the size of the calculated sample should be doubled to take into account the design effect. A minimum of 30 clusters is always required. In each of these clusters, the number of children to be selected is the sample size divided by the number of clusters. For example, in a survey where the expected prevalence of malnutrition is 20%, the required precision 4%, 2 times 384 = 768 children are required. Hence, in each cluster, 768/30 = 26 children will be included.
· Calculation of the cumulative population
A list of the sections is established, as well as their respective population. In a third column, the cumulative total is calculated by adding the population of each unit to the sum of the population of the preceding sections. In other words, it is as if each section was given a certain amount of points, proportional to its population size.
· Calculation of the sampling interval
The sampling interval, in cluster sampling, is the total population divided by the number of clusters, usually 30. The thirty clusters are selected using the sampling interval. In our example, the sampling interval is: 10,000 / 30 = 333.
· Determination of the location of the first cluster
The location of the first section to appear in the sample is randomly selected within the first sampling interval. If the drawing of the first cluster was done from the beginning of the list, the first section would always appear in the sample, which would not give each section the same chance of being selected. A random number is used (Annex 10), in our example, between 001 and 333. Let's say the random number drawn is 256.
· Selection of the clusters
The sampling interval is added to this random number and the first cluster is selected in the section which includes this number. In our case, section No 1 includes 256, and so is the first section to be included, followed by section No 2: 589 (333+256), section No 4: 922 (589 + 333), section No 4 again: 1,255, etc...
A large section may appear twice - two clusters should be drawn in section No 4. In the same way, a small section (smaller than the sampling interval) may not be selected - section No 3 in our example.
· Selection of children in the clusters
Having identified the thirty clusters, a team of data collectors goes to the centre of the selected section. A random direction is picked by spinning a bottle. The bottleneck indicates the direction. A surveyor goes in that direction, from the centre to the border of the section, while counting the number of households he encounters. The first household to be visited is randomly selected from among these households by drawing a random number. A sketch of these households can be used.
The subsequent households are chosen by proximity. The next nearest household available is selected until the required number of children has been measured.
All eligible children are included and thus should be measured and weighed. If a child is not present when the team passes, he has to be found, or the team must come back later to measure this child. If a child has been admitted to an intensive feeding centre, the team must go to the centre and measure him there.