
| The Management of Nutrition in Major Emergencies (WHO - OMS, 2000, 250 p.) |
Introduction
This annex provides guidelines for statistical procedures, including sampling methods and determination of sample size, to be used in nutritional surveys. It fulfils the need - unmet by most handbooks, which deal more with surveys of communicable diseases - for guidance on the type of community-based survey essential for nutritional assessment.
The essential procedures for anthropometric surveys are covered in Chapter 3 and Annex 3; Chapter 2 outlines the parameters and criteria (mostly clinical and biochemical) used in assessing micronutrient deficiencies. In practice, a survey that combines clinical, anthropometric, and biochemical elements is required. Different types of nutrient are usually assessed in different age groups or among individuals of different physiological status, and few manuals provide guidance on how such assessments should be combined or integrated. Table A 4.1 shows the suggested age/sex groups to be examined - usually on the basis of the household-selection procedure described in this annex.
Involvement of a statistician right at the start of the survey design process is important, to ensure that sample sizes are appropriate (neither too large nor too small) and will produce results from which valid comparisons can be made between different populations and in the same population over time. The sample size is usually similar for anthropometry and for assessment of the different types of nutrient, but the design factor (increase in size of cluster sample required because of patchy distribution of the deficiency) is generally recommended to be larger for micronutrient surveys (3) than for anthropometric surveys (2).
The first part of the annex deals with the principles of random sampling and with sample size, and the second part presents various sampling procedures.
Principles of random sample surveys
Basic concepts
When dealing with large population groups it is not feasible to survey all individuals. However, valid conclusions can be drawn from measurements made on only a limited number of individuals within the population, provided that this "sample" is representative of the population as a whole.
The sampling techniques described in this annex are designed to ensure this essential representativeness through randomization in selection and elimination of observer bias. Data obtained only from health services, for example, are unlikely to be representative of the population as a whole; data collected only in the most accessible villages, or in camps that are reported to be in a bad state, will be similarly unrepresentative. Strict procedures must be followed in selecting individuals to be included in a sample to ensure that it is representative. Moreover, if the objective of a survey is to compare the nutritional status of two groups, representative data must be collected from the two groups separately.
Table A4.1 Examples of appropriate age/sex groups for nutritional assessments
|
Age/sex group |
Type of assessment |
|
Children <5 years |
Anthropometry |
|
Children of school age (6-12 years) and adolescents |
Goitre prevalence; urinary iodine; anaemia/iron deficiency |
|
Women of reproductive age, or pregnant women |
Anaemia/iron deficiency, beriberi, scurvy |
|
Adults |
Anthropometry |
The techniques, and the methods of analysing the results, recognize and allow for the fact that there may be some inaccuracy. Data gathered from a sample of a population provide only an estimate of what the results would be if measurements were made on the entire population. Whenever a sample is drawn, there is a risk that it may not be truly representative and therefore yield data that do not reflect the true situation. Inevitably, therefore, if a second sample is drawn from the same population, slightly different results are likely be obtained.
From a sample it is possible to calculate not only an estimate of malnutrition (or other variable of interest) but also the range of values within which the actual rate of malnutrition in the entire population almost certainly lies. The confidence interval is strictly not symmetrical, but as the sample size increases it becomes more and more symmetrical. For example, the 95% confidence limits for a 10% estimate of malnutrition based on a randomly selected sample of 30 children are 2% and 26%. However the confidence limits for a 10% estimate based on a sample size of 2000 are 9% and 11%. See Table A4.3.
A 95% confidence level1 is usually considered to be appropriate for nutritional surveys. The precision of the result and the size of the confidence interval depend on the sample size and the actual prevalence of malnutrition (or other variable of interest) in the population.
1 A 95% confidence level represents an error risk of 5%, meaning that, out of 100 surveys, as many as 5 may give results that do not reflect the true situation purely by chance.
Basic sampling procedure
Three main sampling methods can be used - random, systematic, and cluster. Cluster sampling is the most widely used and often the only feasible method in emergencies involving large population groups. In all cases, estimates are required of the total population and of any subgroups to be distinguished within the total. The essential steps in obtaining a sample are as follows:
1. Obtain available population data. Census data and a list of all settlements in the area might be obtained from departments of planning, statistics, or malaria control, for example. If no data are available, as may be the case for refugees or displaced persons, a rough population estimate should be made by counting the dwellings and estimating the number of people in each dwelling.2. Divide the total population into groups relevant to the information to be collected. In the case of camp populations, it may be desirable to distinguish between different camps, different sections of camps, or between long-term residents and new arrivals. Among rural populations it is generally appropriate to distinguish pastoralists (such as nomadic herders), subsistence farmers, and others (including artisans and traders). If different groups are not distinguished, the survey findings may be difficult to interpret.
3. Choose the sampling methodology to be used. The required precision should be identified and the necessary sample size determined accordingly.
4. Select the households or individuals to be examined. The relevant sampling procedures should be followed carefully.
Defining sample size
The sample size is the number of individuals to be included in the survey to "represent" each population of interest. The sample size required depends on the following factors:
· Required precision and confidence level. The greater the precision required, the larger the sample needed.· Expected prevalence of malnutrition (or other variable being estimated). The smaller the expected proportion of people presenting malnutrition, the greater the size of the sample required for a particular level of precision.
· Time and resources available. The time, personnel, equipment, transport, and funds available for the survey may limit the number of individuals or households that can be visited.
In practice, selection of sample size almost always involves a trade-off between the ideal and the feasible. A sample that is too small gives results of limited precision and therefore of questionable usefulness. For example, a result of 10% wasting (below median - 2SD weight-for-height) in a sample of 100 children would give a confidence interval ranging from approximately 4% to 16% - a result that cannot be interpreted usefully. Beyond a certain level, however, increases in sample size produce only small improvements in precision but involve disproportionate increases in costs. The formulae for calculating sample size (re) are as follows
· for simple random sampling
· for cluster sampling
where:
n = sample size requiredp = expected prevalence of malnutrition in the population; as the prevalence of malnutrition is not known before the survey is done, an estimate must be used - this is usually an experienced guess, or derived from a small pilot survey
e = relative precision required
1.96 is a statistical parameter corresponding to the confidence level of 95% (an error risk of 5%).
k = "clustering" factor, or design factor, which is a measure of the clustering of the characteristic being measured.1
1 According to studies analysed by CDC, the design factor k usually has a value of approximately 2 in anthropometric studies among children under 5 years of age, with 30 clusters.
The sample size for a cluster survey is likely to be larger than that for a random sample for the same precision. This is because the units within a cluster tend to be similar in their characteristics. Poor (and therefore malnourished) people, for instance, are likely to be found living together in the same areas.
Example
Expected prevalence of malnutrition 15%: p = 0.15Relative precision required (e) 20% of the estimated prevalence
Design factor k = 2.
For random sampling:
For cluster sampling:
Table A 4.2 shows the sample sizes required for particular levels of expected prevalence and required precision with a fixed error risk of 5%. To take another example, if the expected malnutrition rate is 15%, and a relative precision of 3% is required, a sample size of 24188 obtained by simple random sampling will be needed. For cluster samples, the figures in Table A 4.2 should be multiplied by the appropriate design factor for the "clustering" of the characteristic being measured within sample clusters.
Table A 4.3 shows confidence intervals at the 95% level (5% error risk) corresponding to various sample sizes and observed rates when random sampling is used. For cluster sampling, the sample sizes must be multiplied by the appropriate design factor to take into account the clustering of the characteristic being measured.
Table A4.2 Sample sizes for estimating a population proportion with specified relative precision (95% confidence level)a
|
Îc |
pb | ||||||||||||||||||
| |
0.05 |
0.10 |
0.15 |
0.20 |
0.25 |
0.30 |
0.35 |
0.40 |
0.45 |
0.50 |
0.55 |
0.60 |
0.65 |
0.70 |
0.75 |
0.80 |
0.85 |
0.90 |
0.95 |
|
0.01 |
729904 |
345744 |
217691 |
153664 |
115248 |
89637 |
71344 |
57624 |
46953 |
38416 |
31431 |
25611 |
20686 |
16464 |
12805 |
9604 |
6779 |
4268 |
2022 |
|
0.02 |
182476 |
86436 |
54423 |
38416 |
28812 |
22409 |
17836 |
14406 |
11738 |
9604 |
7858 |
6403 |
5171 |
4116 |
3201 |
2401 |
1695 |
1067 |
505 |
|
0.03 |
81100 |
38416 |
24188 |
17074 |
12805 |
9960 |
7927 |
6403 |
5217 |
4268 |
3492 |
2846 |
2298 |
1829 |
1423 |
1067 |
753 |
474 |
225 |
|
0.04 |
45619 |
21609 |
13606 |
9604 |
7203 |
5602 |
4459 |
3602 |
2935 |
2401 |
1964 |
1601 |
1293 |
1029 |
800 |
600 |
424 |
267 |
126 |
|
0.05 |
29196 |
13830 |
8708 |
6147 |
4610 |
3585 |
2854 |
2305 |
1878 |
1537 |
1257 |
1024 |
827 |
659 |
512 |
384 |
271 |
171 |
81 |
|
0.06 |
20275 |
9604 |
6047 |
4268 |
3201 |
2490 |
1982 |
1601 |
1304 |
1067 |
873 |
711 |
575 |
457 |
356 |
267 |
188 |
119 |
56 |
|
0.07 |
14896 |
7056 |
4443 |
3136 |
2352 |
1829 |
1456 |
1176 |
958 |
784 |
641 |
523 |
422 |
336 |
261 |
196 |
138 |
87 |
41 |
|
0.08 |
11405 |
5402 |
3401 |
2401 |
1801 |
1401 |
1115 |
900 |
734 |
600 |
491 |
400 |
323 |
257 |
200 |
150 |
106 |
67 |
32 |
|
0.09 |
9011 |
4268 |
2688 |
1897 |
1423 |
1107 |
881 |
711 |
580 |
474 |
388 |
316 |
255 |
203 |
158 |
119 |
84 |
53 |
25 |
|
0.10 |
7299 |
3457 |
2177 |
1537 |
1152 |
896 |
713 |
576 |
470 |
384 |
314 |
256 |
207 |
165 |
128 |
96 |
68 |
43 |
20 |
|
0.15 |
3244 |
1537 |
968 |
683 |
512 |
398 |
317 |
256 |
209 |
171 |
140 |
114 |
92 |
73 |
57 |
43 |
30 |
19 |
9 |
|
0.20 |
1825 |
864 |
544 |
384 |
288 |
224 |
178 |
144 |
117 |
96 |
79 |
64 |
52 |
41 |
32 |
24 |
17 |
11 |
5 |
|
0.25 |
1168 |
553 |
348 |
246 |
184 |
143 |
114 |
92 |
75 |
61 |
50 |
41 |
33 |
26 |
20 |
15 |
11 |
7 |
-d |
|
0.30 |
811 |
384 |
242 |
171 |
128 |
100 |
79 |
64 |
52 |
43 |
35 |
28 |
23 |
18 |
14 |
11 |
8 |
5 |
-d |
|
0.35 |
596 |
282 |
178 |
125 |
94 |
73 |
58 |
47 |
38 |
31 |
26 |
21 |
17 |
13 |
10 |
8 |
6 |
-d |
-d |
|
0.40 |
456 |
216 |
136 |
96 |
72 |
56 |
45 |
36 |
29 |
24 |
20 |
16 |
13 |
10 |
8 |
6 |
-d |
-d |
-d |
|
0.50 |
292 |
138 |
87 |
61 |
46 |
36 |
29 |
23 |
19 |
15 |
13 |
10 |
8 |
7 |
5 |
-d |
-d |
-d |
-d |
a
, where Z1-a represents the number of standard errors from the mean, and a is the significance level of a test.b P= anticipated population proportion (prevalence)
c Î = relative precision.
d Sample size less than 5.
Table A4.3 Confidence intervals at 95% probability level corresponding to various sample sizes and sample percentages
|
Sample size |
Percentage observed in sample | |||||
| |
5% |
10% |
20% |
30% |
40% |
50% |
|
30 |
1-18 |
2-26 |
8-39 |
15-49 |
23-59 |
31-69 |
|
40 |
1-17 |
3-24 |
9-36 |
17-47 |
25-57 |
34-66 |
|
50 |
1-15 |
3-22 |
10-34 |
18-45 |
26-55 |
36-65 |
|
60 |
1-14 |
4-20 |
11-32 |
19-43 |
28-54 |
37-63 |
|
80 |
1-12 |
4-19 |
12-31 |
20-41 |
29-52 |
39-61 |
|
100 |
2-11 |
5-18 |
13-29 |
21-40 |
30-50 |
40-60 |
|
200 |
2-9 |
6-15 |
15-26 |
24-37 |
33-47 |
43-57 |
|
300 |
3-8 |
7-14 |
16-25 |
25-36 |
35-46 |
44-56 |
|
400 |
3-8 |
7-13 |
16-24 |
26-35 |
35-45 |
45-55 |
|
500 |
3-7 |
8-13 |
17-24 |
26-34 |
36-45 |
46-55 |
|
1000 |
4-7 |
8-12 |
18-23 |
27-33 |
37-43 |
47-53 |
|
2000 |
4-6 |
9-11 |
18-22 |
28-32 |
38-42 |
48-52 |
If, for example, the observed malnutrition rate is about 20%, a total sample size of 100 will make it possible to estimate the true rate somewhere between 13% and 29%, assuming random sampling. If greater accuracy is required, for instance 18-22%, a sample size of 2000 would be needed.
In nutrition surveys in emergencies, the expected prevalence of severe malnutrition usually ranges between 5% and 20%, and the precision must be defined accordingly; a relative precision of 20-25% is generally appropriate.
The size of the total population does not normally affect the size of the sample required. However, if the population is small and the calculated sample size turns out to be greater than 10% of the total population, a correcting factor (finite population factor) can be applied as follows:

where
nf = adjusted sample size for small (finite) population
n = sample size for large (infinite) population (for example, as set out in Table A 4.2)
N = population size
f = n/N.
Calculating results and confidence intervals
When results have been calculated, the corresponding confidence interval, d, should be calculated as follows and reported:
· for random sampling:
· for cluster sampling the following formula can be used to give an approximate result:
Using a random number table
A set of random numbers is presented in Table A4.4. Numbers can be read in any direction - from left to right, right to left, top to bottom, or bottom to top.
Table A4.4 Random numbers
|
13 118 |
50 901 |
57 493 |
96 647 |
46 146 |
65 512 |
97 571 |
49 679 |
92 251 |
36 599 |
|
81 111 |
33 653 |
61 544 |
90 072 |
61 635 |
94 254 |
98 222 |
49 594 |
99 403 |
56 952 |
|
07 124 |
56 894 |
00 475 |
09 815 |
05 299 |
17 082 |
80 775 |
11 320 |
98 562 |
68 957 |
|
55 155 |
23 168 |
83 063 |
80 324 |
51 450 |
68 094 |
71 844 |
68 302 |
49 552 |
12 682 |
|
46 406 |
44 641 |
45 461 |
75 174 |
33 268 |
86 032 |
40 355 |
58 288 |
05 532 |
29 419 |
|
10 616 |
17 092 |
76 614 |
04 950 |
67 982 |
28 515 |
16 782 |
86 129 |
44 391 |
64 449 |
|
38 497 |
57 435 |
46 124 |
37 302 |
10 783 |
93 043 |
06 903 |
77 158 |
49 638 |
26 211 |
|
83 203 |
45 840 |
75 843 |
75 843 |
74 567 |
75 971 |
97 779 |
98 047 |
68 916 |
35 038 |
|
19 236 |
62 703 |
12 863 |
14 452 |
72 228 |
55 022 |
07 024 |
43 615 |
74 802 |
02 110 |
|
79 024 |
60 592 |
93 692 |
29 737 |
09 314 |
26 191 |
52 484 |
11 588 |
14 078 |
85 947 |
|
76 073 |
57 252 |
52 795 |
67 673 |
62 267 |
29 552 |
68 244 |
49 280 |
58 583 |
42 190 |
|
50 568 |
66 590 |
38 807 |
30 061 |
26 336 |
46 147 |
04 554 |
44 562 |
72 604 |
63 031 |
|
11 838 |
73 906 |
55 981 |
23 668 |
22 627 |
88 438 |
96 686 |
73 645 |
81 410 |
10 942 |
|
57 618 |
30 523 |
16 757 |
11 956 |
58 411 |
41 647 |
67 884 |
30 084 |
14 500 |
66 958 |
|
61 846 |
47 265 |
09 508 |
11 030 |
10 462 |
93 922 |
17 022 |
71 031 |
07 827 |
94 722 |
|
60 935 |
25 351 |
11 687 |
07 679 |
73 455 |
58 617 |
24 415 |
56 921 |
88 450 |
50 471 |
|
63 328 |
21 749 |
74 262 |
77 143 |
55 995 |
50 707 |
91 516 |
38 002 |
60 552 |
00 634 |
|
75 937 |
07 127 |
11 014 |
00 738 |
46 159 |
09 866 |
87 587 |
41 648 |
36 538 |
24 398 |
|
11 981 |
89 485 |
54 965 |
08 300 |
67 724 |
24 919 |
65 682 |
50 101 |
45 470 |
07 232 |
|
12 311 |
17 067 |
42 758 |
64 557 |
46 297 |
28 414 |
93 801 |
81 180 |
12 176 |
08 536 |
|
45 160 |
76 932 |
00 433 |
42 228 |
73 696 |
27 478 |
65 321 |
22 979 |
30 198 |
86 708 |
|
26 427 |
48 280 |
53 441 |
44 543 |
95 231 |
39 939 |
09 251 |
09 755 |
26 671 |
89 392 |
|
54 568 |
17 774 |
95 705 |
28 018 |
26 507 |
63 504 |
98 872 |
22 449 |
56 423 |
59 133 |
|
80 855 |
94 883 |
08 969 |
16 949 |
86 045 |
68 398 |
46 164 |
57 147 |
35 104 |
37 262 |
|
96 203 |
73 918 |
77 875 |
48 444 |
08 167 |
58 460 |
87 945 |
52 145 |
20 330 |
77 172 |
|
91 210 |
89 152 |
93 904 |
27 666 |
51 080 |
00 487 |
12 073 |
41 639 |
28 717 |
33 909 |
|
37 808 |
11 431 |
03 351 |
82 979 |
96 677 |
41 588 |
17 592 |
51 11x |
84 657 |
25 427 |
|
47 738 |
40 686 |
00 948 |
46 598 |
99 095 |
67 011 |
05 786 |
05 642 |
26 282 |
97 486 |
|
03 255 |
71 561 |
78 549 |
15 611 |
49 097 |
58 375 |
70 087 |
10 066 |
83 530 |
26 684 |
|
92 658 |
11 755 |
39 005 |
72 386 |
20 601 |
49 630 |
85 266 |
78 939 |
89 931 |
99 674 |
|
86 040 |
48 908 |
88 153 |
05 616 |
91 381 |
88 378 |
28 263 |
34 725 |
80 739 |
15 251 |
|
87 806 |
60 615 |
14 520 |
04 557 |
72 939 |
71 060 |
10 650 |
58 769 |
07 497 |
00 808 |
|
46 138 |
03 111 |
47 053 |
89 391 |
83 636 |
05 877 |
17 980 |
63 940 |
23 003 |
23 737 |
|
81 514 |
46 994 |
77 869 |
72 054 |
22 819 |
89 316 |
77 195 |
20 194 |
65 043 |
27 706 |
|
28 419 |
60 216 |
07 640 |
80 670 |
84 427 |
98 368 |
99 656 |
10 214 |
04 023 |
39 899 |
|
99 109 |
64 711 |
06 962 |
56 790 |
96 313 |
54 470 |
18 568 |
04 319 |
31 680 |
39 507 |
|
15 045 |
85 129 |
03 531 |
06 107 |
93 785 |
38 290 |
00 911 |
68 388 |
68 686 |
53 357 |
|
61 398 |
94 861 |
90 462 |
09 438 |
53 920 |
59 996 |
91 957 |
39 255 |
86 563 |
20 781 |
|
58 455 |
18 205 |
39 389 |
18 286 |
22 994 |
78 421 |
22 241 |
04 228 |
86 679 |
47 840 |
|
81 025 |
70 374 |
79 493 |
39 386 |
41 707 |
57 491 |
35 647 |
43 409 |
37 182 |
73 435 |
Numbers can be read off with any required total number of digits. The steps involved in using this, or any other, set of random numbers are:
1. Decide on the direction in which numbers will be read; e.g. left to right going down the page.2. Specify the required number of digits. If a random number is required in the interval 0001 to 1342, 4 digits are needed (any of which may be zero).
3. Close your eyes and stick a pin (or other sharply pointed object) in the table. Read off the required number of digits in the direction chosen in step 1, starting with the first digit to the left of the point. If the resulting number falls within the required interval, use this number. If not, repeat the process until an eligible number is drawn or move to the next number.
Sampling methods
All sampling methods involve a highly ordered form of selection designed to eliminate observer bias; each can be adapted in various ways depending on the situation. The paragraphs that follow provide a general description of each method and how it can be applied.
In all cases, each selected individual, or every child under 5 years old belonging to each selected household, must be seen and (for an anthropometric survey) measured. The survey team, with the help of the community, must find the individuals concerned, wherever they are. If necessary, the team must return later to see and measure an individual missed on the first visit. No substitutions can be allowed and no one can be missed (unless they have died or left the community being surveyed).
Random sampling
Random sampling is the best method - when it can be used - since it is the only one that ensures representativeness. An up-to-date list of all individuals in the population is needed, with enough information to allow them to be located. Individuals are randomly drawn from the list using a random number table (see above and Table A 4.4). For a nutritional survey the sample would be restricted to children aged 6-59 months or 65-110 cm in length or height.
In practice, a reliable population list is rarely available, and it is sometimes practical to use the following alternative procedure:
1. Go to the area and make a list of all households included in the area of interest.2. Assign each household on the list an identification number.
3. Select the required number of households using a random number table. Otherwise, pick household identification numbers out of a hat or a large box. (If this type of selection is done in public, the community can see how households are selected.) A number corresponding to each household is written on a small piece of paper, which is placed in the hat or box. The pieces of paper are shuffled and the required number of papers are then picked out (blindly). The households selected in this way become the sample for the survey.
4. Visit all of these (and only these) households. No households may be excluded or substituted for any reason. In a nutritional survey, all children in the specified age group belonging to each selected household must be measured.
Systematic sampling
Systematic sampling eliminates the need for complete, up-to-date population registers, but requires:
· a reasonably accurate plan or map showing all households; and· an orderly layout, or site plan, which makes it possible to go systematically through the whole site.
This technique has been used in well-organized refugee camps, where households are arranged in blocks and lines. The procedure is as follows:
1. Either list all households and assign each one an identification number, or trace a continuous route on the map, which passes in front of every household.2. Calculate the number of households to be visited in order to obtain the required sample. If the required sample size is 544 and there are, on average, 15 children (aged 6-59 months) per 10 households, the number of households to be visited is 544/1.5 = 362.6, or 363 (round up to the nearest whole number in this calculation).
3. Calculate the "sampling interval" by dividing the total number of households by the number that must be visited. If the total number of households is 5000, and 363 are to be visited, the sampling interval is 5000/363 = 13.8, or 13 (round down to the nearest whole number in this calculation).
4. Select the first household to be visited within the first sampling interval at the beginning of the list (or route) by drawing a random number which is smaller than the sampling interval. If the number drawn is 7, start with the seventh house.
5. Select the next household by adding the sampling interval to the first household identification number (or counting that number of households along the prescribed route), e.g. 7 + 13 = 20.
6. Continue in this way (e.g. 7, 20, 33, 46, etc.) until the number of households required for the survey has been systematically selected.
7. Visit all of these (and only these) households. No selected household may be excluded or substituted for any reason.
Two-stage cluster sampling
Two-stage cluster sampling is used in large populations, when no register is available and households cannot be visited systematically. Sampling is done in two stages:
1. Clusters, or sampling sites, within the total population are selected randomly. (Clusters may be natural groupings such as villages or, in a camp, blocks of a few houses. Where natural groupings do not exist, artificial clusters may be defined by imposing a grid on a map of the area.)2. Within each selected cluster, an appropriate number of individuals or households are randomly selected.
This process is applied separately to each population of interest. For instance, if a comparison is to be made between two separate, large refugee camps, the same number of clusters must be surveyed in each camp.
The larger the number of clusters, the higher is the probability of good representativeness of the population under study. In practice, physical constraints will limit the number of subjects who can be conveniently studied in a cluster; 30 subjects may often be the maximum to which easy access is possible in a community. The number of clusters to be examined is then derived by dividing the desired sample size, as determined below, by 30. It should be remembered that the sample size for clusters is larger than that for simple random samples.
Stage 1: selecting the clusters
Where feasible, the population is divided into a large number of clusters (at least 100) containing similar numbers of people using administrative, physical, or geographical boundaries. For this purpose, a map and a list of all separate identifiable units will be needed. Well defined villages of similar size are examples of possible clusters. Larger villages can be divided into two or more clusters. In a refugee camp, existing or imposed "sections" can be used. These clusters are numbered and then, using a random number table or systematic sampling, 30 are selected.
Alternatively, and more usually, the following procedure can be used:
1. Prepare a list of all existing units or zones with their estimated populations. (A unit or zone may comprise a village, camp, defined neighbourhood, or "section" within a camp.)2. Add two more columns. In the first, record the cumulative population figures obtained by adding the population of each unit or zone to the combined population of all the preceding units or zones on the list, as shown in Table A 4.5.
3. Calculate the sampling interval by dividing the total population by the number of clusters required (30). For example, if the population is 18 600, the interval will be 18 600/30 = 620.
4. Using a random number table, obtain a number between 1 and the sampling interval to define the unit or zone where the first cluster will be drawn. In the example in Table A 4.5, a random number of 510 places the first cluster in unit 1.
5. Add the sampling interval repeatedly to the original random number (e.g. 510, 1130, 1750, 2370...) to locate additional clusters up to the required total of 30, as shown in Table A 4.5. Note that large population units are likely to be assigned more than one cluster; small units (with populations less than the sampling interval) may have none.
6. Within each unit to which more than one cluster is assigned (e.g. unit 3 in Table A 4.5) further sampling is undertaken to locate the required number of clusters within the unit. Make a sketch map of the unit or zone and subdivide the whole into subunits of roughly equal population (or numbers of households), as illustrated in Fig. A 4.1. Randomly select from these the required number of clusters using a random number table or by drawing numbers out of a hat.
Table A4.5 Example of first stage of cluster sampling
|
Geographical units/zones |
Estimated population |
Cumulative population |
Attributed numbers |
Location of clusters |
|
Unit 1 |
800 |
800 |
1-800 |
1 |
|
Unit 2 |
310 |
1 110 |
801-1110 | |
|
Unit 3 |
1 220 |
2 330 |
1111-2330 |
2, 3 |
|
Unit 4 |
550 |
2 880 |
2331-2880 |
4 |
|
etc... |
... |
... |
... |
... |
|
... |
... |
... |
... |
... |
|
Total |
18 600 |
18 600 |
18 600 |
(30) |
Note: See fig. A4.1 for an explanation.
Never change a sampling site because it is too remote or is close to a bigger and "worse affected" place that someone feels should be surveyed in preference to the randomly selected "unimportant" site.
Strictly speaking, clusters for nutritional surveys should be defined on the basis of the numbers of children aged 6-59 months. In most situations, the proportion of children is relatively uniform, and figures for the population as a whole can be used, as indicated above. However, if there are known to be wide variations in the proportion of children in the populations of different areas, the numbers of children aged 6-59 months should be estimated and used as a basis for defining clusters. On the other hand, where reliable population figures are not available, clusters may have to be defined on the basis of estimates of the numbers of households in different units or zones.
Stage 2: selecting individuals within each cluster
Once the survey team is on site, the required number of children (usually 30) can be selected by systematic sampling, as described above, if the site layout permits. Alternatively, a sketch map of the area should be drawn, the houses numbered, and households selected using a random number table. In many situations, neither of these methods is feasible and the following procedure is adopted:
1. Go to the centre of the selected unit or cluster.2. Randomly choose a direction by spinning a pencil (pen, bottle) on the ground (or a flat surface) and noting the direction in which it points when it stops.
3. Walk in that direction from the centre to the outer perimeter of the unit or cluster, counting the number of households along this line.
4. Using a random number table, obtain a number between 1 and the number of households counted.
5. Go to the household indicated and examine all children belonging to that household (e.g. if the number is 5, go to the fifth household along the randomly chosen line).
6. Go to the next nearest house, the one with the door nearest to the last house surveyed.
7. Continue the process until the required number of children (probably 30) has been completed.

Note: In most cases a population will be divided into at least 100 clusters, of which 30 will be selected.
The method to be used must be decided in advance and used consistently throughout the survey. It is important that there be no element of deliberate choice by the survey team in selecting the sample houses.
All children belonging to each selected household should be surveyed, including those in the last household (even if this means exceeding the number "required"). No substitutions can be made.
Thirty separate clusters should be surveyed if at all possible. If the number of clusters is reduced, the reliability of the estimate obtained may be poor and provide an inaccurate picture of the true nutritional status of the population being surveyed. A greater number of children per cluster does not compensate for a reduced number of clusters.1
1 More than 30 clusters may be surveyed, but this will not significantly increase the accuracy or reliability of the results.