|Multiple frame agricultural surveys. v. 1: Current surveys based on area and list sampling methods. (FAO Statistical Development Series - 7) (1996)|
|PART I. CURRENT AGRICULTURAL SURVEYS|
This first part refers to the initial considerations and definitions required for the establishment of Current Agricultural Survey Programme, that is, a national (or large-scale) and periodic (annual or seasonal) agricultural data collection programme based on probability sample methods to be conducted in order to obtain timely and reliable basic data for the agricultural sector.
It includes also the concepts and definitions required to describe the main types of Current Agricultural Survey designs, and provides comparisons between the alternative methods and in particular with the multiple frame survey design described in detail in Parts II and III of this manual.
For the establishment of a current agricultural survey programme it is indispensable first to review existing estimates and the state of knowledge of, at least, the following characteristics of the agricultural sector:
* Total production for the most important agricultural commodities; areas under crops and livestock and poultry inventories.
* Number of agricultural holdings and their geographic distribution.
* Geographic distribution of crops and livestock.
* Number and percentages of holdings producing individual commodities of interest.
* Average size of holding and size ranges.
* Average number of parcels per holding and average distances between parcels.
* Extent of holders living on or close to their holdings, or in villages.
* Average distances that holders must travel to reach their holdings.
* Number of holders grazing livestock on public or communal land without any lease or way to relate them to the land. Number of holders with no land.
* Cultivation methods- multiple cropping, mixed or associated cropping, single crop per field etc.
* Crop Calendar - crops grown, planting and harvesting dates per year.
* Weather conditions and communications.
* Marketing channels for important crops and livestock.
* Sources of agricultural data (surveys and censuses, administrative records, export records, tax lists, purchase records of controlled crops, farmers' associations', etc.).
* Administrative and political subdivisions of the country.
* Types of cartographic materials (maps, satellite images and aerial photography) that exist for the country and source from which they can be obtained.
* Agricultural zones with different proportion of cultivated land; and zones were the land is difficult to divide into small areas with stable and recognizable physical boundaries.
In addition of obtaining the best possible information and estimates of the above items, in order to plan the survey the following factors should also be taken into account:
* The holders' capacity to provide accurate data.
* The capacity of staff to organize and conduct the survey programme.
* Prioritization of the needs of data users.
Once the data requirements are identified and a decision is made that existing data are inadequate, the first step in planning an agricultural survey is to lay out the objectives of the investigation. The survey objectives should be as specific, clear-cut and unambiguous as possible and should include the required level of accuracy of the data since this has a direct bearing on the overall survey design,
In discussing the specification of objectives, questions of the type illustrated below should be explored in detail.
* What is expected from the agricultural survey?
* What is an agricultural holding? Which are the definitions of a holder and other respondents responsible to provide information on the holdings?
* Which agricultural variables should be investigated in the periodic agricultural survey and what is the required periodicity for the estimates of such variables over time?
* Which are the existing main estimates of agricultural commodities considered inadequate that should be investigated in a new agricultural survey?
* What are the required levels of accuracy for the estimates of the main crops, livestock and other main variables to be included in the agricultural survey?
* At what level will the data be summarized, i.e., country, state, region, watershed, etc.?
In setting out the objectives of the agricultural survey, it is necessary, in particular, to define precisely some common terms used in describing the agricultural sector. For instance, the terms agricultural holding and agricultural holder must be carefully defined in such a way that they are practical to use.
An Agricultural Holding (or Farm) is an economic unit of agricultural production under single management comprising all livestock kept and all land used wholly or partly for agricultural production purposes, without regard to title, legal form, or size. It includes land rented, land owned, and land being effectively used by the management under whatever type of other arrangement. Single management may be exercised by an individual, jointly by two or more individuals, or by a household, clan or tribe, or by a juridical person such as a corporation, religious organization, cooperative or government agency. The holding's land may consist of one or more separated parcels (simple compact blocks of land), located in one or more separate areas or in one or more territorial or administrative divisions, providing the parcels share the same production means utilized by the holding, such as labour, farm buildings, machinery or draught animals; cf. FAO (1995).
A holding parcel is any piece of land entirely surrounded by other land, water, road, forest, etc. not forming part of the holding. A parcel may consist of one or more fields adjacent to each other.
A field is a piece of land in a parcel separated from the rest of the parcel by easily recognizable demarcation lines, such as paths, fences, cadastral boundaries and/or hedges, and on which a specific variety of crop and planting date, or specific crop mixture is cultivated1.
1 A field as defined here corresponds to a plot in FAO (1995).
The following additional points relate to identifying a holding:
(a) holdings may have no significant land area, e.g. poultry hatcheries or holdings keeping livestock for which land is not an indispensable input for production;
(b) holdings consisting exclusively of the harvest of tree crops may be operated by persons who do not have any rights to agricultural use of the land on which the trees are grown.
The holder (farmer or operator) is a civil or juridical person who makes major decisions regarding resource use and exercises management control over the agricultural holding. The holder decides which crops to plant; when to plant; when to harvest, when to sell livestock, where to sell livestock and how many to sell etc. The holder has technical and economic responsibility for the holding and may undertake all responsibilities directly, or delegate responsibilities related to day-today work management to a hired manager; FAO (1995).
A hired manager is a civil or juridical person who takes technical and administrative responsibility to manage a holding on a holder's behalf. Responsibilities are limited to making day-to-day decisions to operate the holding, including managing and supervising hired labour. Where the hired manager shares economic and financial responsibilities in addition to managing the holding, the hired manager is usually considered a holder or a joint holder; FAO (1995).
The respondent is the person from whom data are collected about the holding or part of a holding. The respondent for an agricultural survey should be the holder or the manager.
The agricultural survey objective is to obtain estimates of several variables of interest for the total survey area. The estimate of each variable for the survey area is a number obtained through an inference procedure based on the values of the variable in all or a sample of reporting units, which are usually the holdings or land areas called tracts. The estimates are numerical characteristics of the population of reporting units.
The survey area of interest will be the total of the country or a given Province or State, or other primary level of the administrative or political subdivisions of the territory.
For the national, periodic and multiple-purpose agricultural surveys considered, the most common survey variables are the following:
* Planted and harvested area, area intended for harvest, potential and actual crop yield of each crop or variety of crop, crop production and number of trees;
* Livestock and poultry inventories (e.g. number, type, age, sex, breed and use);
* Production of milk, eggs, honey and seeds;
* Number and types of farming methods and agricultural inputs including labour, type and quantity of seeds, fertilizers and pesticides, source of irrigation water, drainage, extent of shifting cultivation, stocks; machinery, equipment and agricultural buildings;
* Number and types of agricultural holdings (e.g. number, location, legal status, land tenure);
* Costs of production and value of sales;
* Population involved in agriculture (e.g. basic demographic characteristics of the holder, holder's household members working in the holding, hired workers on the holding, days of work, etc.).
The reporting unit of a survey variable, is the unit to which the information collected on the variable refers. Each survey variable should be well defined on the set of reporting units. The most common type of reporting unit is the agricultural holding, since agricultural characteristics are naturally defined for the holdings. But the holding is not the only possible reporting unit of a variable. For a given survey variable, the reporting unit may be a land area called a tract, as will be defined in what follows.
For agricultural surveys, it is often convenient to consider the total survey area (a Province, for instance), completely subdivided into non-overlapping land areas called segments, and that each segment is subdivided into non-overlapping tracts. A tract is defined as the land area of a holding inside a segment, or the land area of a segment that does not belong to any holding. In this case, the total survey area is completely subdivided into non-overlapping tracts, a tract being part of a holding or a non-holding area.
All of the above survey variables are defined in the holdings, that is, a value of the variable for each holding can be known. However, some of those survey variables can also be defined on tracts, as it is the case of the planted area of a crop, for example,
For the agricultural surveys considered, a value of a survey variable must be defined for each holding of the survey area, or it must be defined for each tract of the survey area.
Therefore, the reporting unit of a variable can be a holding or a tract of the survey area.
Some agricultural surveys consider variables with both types of reporting units.
As already mentioned, the agricultural surveys consider an inference procedure to estimate the value of each variable for the survey area from the values of the variable in all or a sample of reporting units (holdings or tracts) should be established. Consequently, the agricultural surveys can be classified into censuses or sample surveys.
An agricultural census is a survey in which the value of each variable for the survey area is obtained from the values of the variable in all reporting units, that are usually the holdings. The primary objective of agricultural censuses is to provide a detailed classification of the agricultural structure of the country2.
2 The term agricultural census is used by FAO in a broader sense to designate an agricultural survey with the above primary objective, but conducted on a complete or sample enumeration basis; cf. FAO (1995). However, for the purpose of classifying surveys in this manual the above standard definition of an agricultural census is used.
An agricultural sample survey is an agricultural survey for which the inference procedure to estimate each survey variable for the total survey area is based on the values of the variable obtained from a sample of reporting units. Questionnaires are completed for each of a sample of reporting units. An agricultural sample survey is usually conducted to measure the performance of the agricultural structure.
An agricultural sample survey for which the inference procedure to obtain the estimates of the survey variables is based on probability sampling and estimation methods is called a probability sample survey, a term due to Deming. For a probability sample survey the statistical precision of the estimates can be established.
On the other hand, an agricultural sample survey for which the inference procedure to obtain the estimates of the survey variables is not based on probability sampling and estimation methods is called a nonprobability or subjective sample survey.
Therefore, the agricultural data collection programmes considered can be classified as follows:
| || |
Subjective Sample Surveys
| || |
Agricultural Sample surveys
| || |
Probability Sample Surveys
This manual will describe in detail only probability sample surveys, since these are the only surveys for which the required levels of accuracy for the estimates can be established. However, some brief considerations of censuses and nonprobability sample surveys are also provided.
220.127.116.11 Complete Enumeration Surveys (Censuses)
An agricultural census provides estimates for each holding and, therefore, aggregate data for the smallest administrative, political or statistical subdivisions of the country and for classifications of holdings by size or other subgroups of interest. On the other hand, by definition, an agricultural sample survey does not provide data from each holding and it can only provide reliable estimates for a relatively small number of political or administrative subdivisions (or other subgroups of holdings) for which it was designed to cover, usually at a significant level of aggregation.
With a large population of holdings, a complete enumeration of all holdings is an expensive and time consuming operation subject to considerable errors, including errors of omission, duplication, recording errors, data processing errors, etc. These errors derive mainly from the difficulty of properly supervising such a large operation and controlling the quality of the work. A census is always incomplete and never completely accurate, but, if efficiently conducted, it can serve also as a reference for planning and judging results of sample surveys. A sample survey often produces more accurate results than a census for a large population of holdings, because it makes possible a more careful supervision of the field work and processing of results.
An agricultural census is not a practical means of providing current (annual or seasonal) agricultural data for a large population of holdings. Even if the resources to collect the data were available, the large quantity of data to be processed usually implies that census results are released to the public two to five years after the census data collection period.
18.104.22.168 Nonprobability (Subjective) Sample Surveys
As already defined, a sample survey in which the statistical precision of the estimates cannot be calculated is called a subjective (or nonprobability) sample survey. Subjective surveys are used in cases when statistically accurate data is not required or when there are no resources for its production.
Most estimates in most countries are still based on subjective agricultural sample surveys. In fact, if all else fails, and it sometimes does, the opinions of experts (rural agricultural agents), the use of windshield surveys, administrative records, rapid rural appraisals and subjective samples of holdings or holders' addresses are widely used to provide estimates. The accuracy of such estimates depends on the knowledge and ability of the person that puts them together and there is no way to estimate the possible error that may be contained therein. Nonprobability surveys can be useful if the objective of the survey is to produce only simple ratio estimates. They are also used when good check data exists to establish a regression procedure.
Subjective surveys based on administrative registers are quite common. Estimates for some items in many countries are derived from summing area and production of crops, livestock and other items that are reported to a government agency by each holder in order to obtain permission for marketing, or to receive a subsidy, or simply because the report is required by law. This source of information should be examined closely. First of all, it may be significantly incomplete because a great part of the production may be marketed informally as contraband with only a minimum reported to maintain eligibility. Or, on the contrary, exaggerated production may be reported to obtain a larger subsidy. There are many reasons why holders may not be inclined to report accurately on their holdings.
It should also be mentioned that in most countries that formerly had a centralized economic system, national agricultural statistics were based on complete censuses or complete registers, and probability sample surveys were seldom utilized. But in practice, in many cases, such supposedly complete censuses and registers are more properly described as being subjective sample surveys.
22.214.171.124 Probability Sample Surveys
Agricultural sample surveys which use probability sampling methods that allow calculation of the statistical precision of the estimates are called probability sample surveys. In such surveys the statistical precision of the estimates have a precise mathematical meaning.
The planning of an agricultural probability sample survey requires the following additional specifications in order to define the probability model on which the estimates of the variables for the total survey area (usually a Province or the total of the country) from the selected sample are based:
* The survey population (or sampled population), which is the set of units to be sampled, called sampling units or enumeration units. The sampling units may be different from the reporting units of the survey variables, which are the holdings or the tracts of the survey area. The most commonly used sampling units are holdings for holders addresses or segments which are land areas.
* A probability selection procedure such that each possible sample of sampling units of a given size has a non-zero probability of selection.
* Rules of association between the sampling units and the reporting units. A sampling unit must either be associated with one and only one reporting unit or there must be a known rule to associate sampling units with a group of reporting units.
* The survey variables should be defined in each sampling unit as a function of its values in the group of associated reporting units.
* The estimator for each survey variable. An estimator is a random variable: a numerical function defined for each possible sample of sampling units. The value of the estimator for the selected sample provides the estimate of the survey variable. The estimate of each variable for the survey area is based on the values of the variable in a sample of sampling units associated with the reporting units of the variables (the holdings or the tracts of the survey area).
* The variance of the above estimators, which are also random variables, provide the precision of the survey estimates. For a given variable, the value of the variance of the corresponding random variable for the selected sample provides the sampling error of the estimate. The level of accuracy or desired degree of precision of the estimates should be established.
The target survey population is the population it is desired to survey: because of practical constraints, the survey population, which is the population actually sampled (the set of sampling units), and for which inferences are valid for obtaining the survey estimates, may be different from the target population.
The sample design of a sample survey refers to the techniques for selecting a probability sample and the methods to obtain the estimates of the survey variables from the selected sample.
Sample designs involve, therefore, different types of sampling units, rules to assign probabilities of selection to sampling units, sampling fractions, possible stratification and clustering procedures and different types of estimation methods, Some designs involve several sampling selection stages in which case, for each sampling stage, the sampling units, probabilities of selection and sampling method have to be established to obtain the final survey estimates.
For a given sampling stage of a probability sampling design, a sampling frame is the total set of sampling units with their probabilities of selection, that is, the list of sampling units from which the sample is selected together with their probabilities of selection. A frame is needed and has to be constructed for each sampling selection stage and a non-zero probability of selection has to be assigned to each sampling unit of the frame.
A few of the most common terms used for the description of agricultural probability sample designs are listed below for easy reference.
In stratified sampling, the survey population is subdivided into non-overlapping sets called strata. Each stratum is treated as a separate population.
Primary Sampling Unit (PSU) is the term used for designating the sampling unit at the first stage of selection in multiple-stage sampling.
Cluster sampling is the term used for sampling plans in which the sampling units are groups (clusters) of population units.
Equal Probability Selection Method (EPSEM) refers to a sample selection in which every sampling unit has the same probability of being selected for the sample.
Sample Selection with Probability Proportional to Size Measure (PPS) is a sampling procedure in which the probability of selection of a sampling unit is proportional to its assigned size called measure of size.
In replicated sampling the total sample is made up of a set of replicate subsamples, called replicates, with the identical sample design.
The Coefficient of Variation (CV) of an estimate is obtained by dividing the standard error of the estimate by the estimate itself, expressed as a percentage. It is an indication of the precision of a sample estimate.
The Design Effect (Deff) of an estimator, a term due to Kish, is the ratio of the variance of the estimator for the given sample design to the variance of a corresponding simple random sample of the same size.
The Mean Square Error (MSE) of an estimate is equal to the variance of the estimate plus the square of the bias inherent in the survey procedures.
The difference between a sample result and the result from a complete count taken under the same conditions is measured by the precision or the reliability of the sample result.
The difference between the sample result and the true value is called the accuracy of the sample survey.
It is the accuracy of the survey that is most important; but it is the precision that is possible to measure in most instances.
Coverage error refers to omission and duplication of reporting units, including incorrect determination of the land area corresponding to the reporting units.
The overall survey design of a probability sample survey refers to the definitions and the established methods and procedures concerning all phases needed for conducting the survey: the sample design, the selection and training of personnel, the logistics involved in the management of the field force and the distribution and receipt of survey questionnaires and forms, and the procedures for data collection, processing and analysis.
For the general theory of survey sampling the reader can refer to the classical books by Hansen, Hurwitz and Madow (1953), Deming (1960), Kish (1965) and Cochran (1977).
There are two basic types of sampling frames used for the last-stage of selection of an agricultural probability sample survey: area frames and list frames.
The sampling units of an area frame are land areas called segments, and the selection probability of each segment is proportional to its area measure. The sampling units of a list frame are usually the holdings or holders addresses.
Therefore, there are two basic types of sample designs in terms of the last-stage sampling unit and the rules to assign their probabilities of selection, namely area sample designs and list sample designs, also known as area frame sample surveys and list frame sample surveys.
Multiple frame surveys are those agricultural probability sample designs that combine more than one sample design to obtain the survey estimates, combining area frame designs with list frame designs. A multiple frame survey usually consists of an area sample component and a list sample component.
Consequently, agricultural probability sample surveys can be classified as follows:
Area sample surveys
Agricultural Probability Sample Surveys
List sample surveys
Multiple frame surveys
For the current agricultural probability sample surveys considered, it will be assumed that most data will be obtained by enumerators through personal interviews in the field. Indeed, other means of periodic data collection, for example the use of self-administered questionnaires, mail, telephone inquiries, or extraction of information from registers are usually not feasible in developing countries.
The field data collection of an agricultural survey may include or not, in the selected sampling units, expert observation or identification and objective measurement of crop areas or other agricultural characteristics of special interest.
List sample designs are the most commonly used sampling procedures for agricultural surveys.
As defined above, a probability sample that is not an area sample is called a list sample. The usual reporting unit is the holding and the commonly used last-stage sampling unit is the holding or the holder address.
A list frame is a frame used for the last selection stage of a list sample design. Therefore, list frames are often formed by holdings or holders addresses.
Frame Construction and Selection Procedures of List Sample Designs
In many countries, country-wide surveys are conducted using samples from large list frames of holdings or holders addresses. These list sampling frames come from previous agricultural, housing or population censuses, from an accumulation of lists prepared by political or administrative subdivisions, from farmer's associations, land tax records, cadasters or from other sources. There is usually information on holdings size, crops grown, livestock held and other characteristics for each holding. Some of this information allows to stratify the frames, which greatly improves sampling efficiency.
Large list frames are often incomplete or inaccurate. They contain an unknown amount of duplication, and their accuracy degenerates rapidly over time. If the list is a few years old, many of the names will no longer represent a holding due to sales, deaths, and abandonment, and new holdings will not be represented. It is also common in list building to unintentionally include more than one name that could be associated with a specific holding. Joint holders are a problem in list building. In short, it can be seen that it is difficult for a list of holding addresses to fulfil the requirements for an accurate probability sampling frame, particularly if it is a large list. To be truly effective, a large list of holdings must be continuously and systematically updated which is expensive, time-consuming and requires a large staff. When the list frame is inaccurate the sample cannot be assumed to be a probability sample and should be considered a subjective survey. This is the case in many countries that use list sample surveys.
Some European countries are able to utilize large, country-wide list frames effectively because of administrative procedures that result in the frequent registration of all holdings. These countries have the advantage of being able to use mail and telephone for data gathering from the list. Otherwise, holdings selected at random, or even systematically, from such a list will be widely spaced over the countryside, greatly adding to the cost of field enumeration.
List sample designs often include at least one stratum of special holdings that is completely enumerated, or sampled with a high sampling fraction. Such special holdings are defined as those which contribute to a significant proportion of the total estimated value of important survey variables. Examples are the largest holdings, holdings with the largest area for a given crop or with the largest livestock herd, highly specialized holdings or those corresponding to a localized production. Such list of special holdings is fairly easy to update since the holdings involved are usually visible and well known. Information can be obtained from extension agents, producers' associations, banks, tax records, agricultural censuses and from government agencies that control and/or purchase production of certain crops and other agricultural commodities. The preparation of such list of special holdings should include the accumulation of data on each holding such as holding size, crops grown, type of livestock held and inventory for stratification purposes if the list is to be sampled.
In most countries, a reliable country-wide list frame of holdings or holders addresses does not exist, and so a rough approximation is used on which to base a multiple-stage list sample survey. In such common cases, a sample of holders is selected by first selecting a sample of villages (clusters) with probability proportional to their total population (or housing units) since such information is usually available in most countries and approximates the number of holders. Some additional information about the villages such as farm population at some point in time and primary agricultural activity of the holdings may be available allowing for some rudimentary stratification. Other small administrative subdivisions such as districts or subdivisions of census enumeration areas (EAs) from the latest population census or from the latest agricultural census can be used instead of villages as Primary Sampling Units (PSUs) which is a common form of cluster sampling. All holdings are listed within the selected PSUs and a sample of holdings (as represented by holders) is chosen in the second and final stage.
Data Collection Procedures for List Sample Designs
During data collection, usually the enumerator completes a questionnaire for each selected holding by conducting an interview with the holder. Often, the enumerators also measure the area of the holding and its fields for a subsample of holdings. The objective measurement of agricultural areas is often required since in many countries such basic data is not known (or an accurate answer cannot be obtained) from the holders.
The above types of list sample designs are those frequently used in developing countries.
An area sample survey is a probability sample survey in which the final stage sampling units are land areas called segments, and the selection probabilities are proportional to their area measures.
The measures of size used to select the segments (sampling units) are defined as a function of their area measurements. The usual type of measure of size is the total area.
The segments should not overlap and must cover the entire survey area. The term segment is also used to refer to the piece of land associated with the sampling unit or to the group of reporting units associated with the piece of land.
Segments Considered for Area Sample Designs of Agricultural Surveys
Types of Segments Used for Agricultural Area Sample Surveys
I. segments with identifiable physical boundaries, terrain features such as roads, rivers, canals, railroads, etc. that are readily found and provide an unambiguous identification of the segment; or
II. square segments, that is, segments defined by straight lines forming squares whose end points are established by map coordinates. In this case, grid sampling procedures are used.
III. segments that coincide with the land of agricultural holdings. In this case, point sampling procedures are used.
The land of each segment is subdivided into non-overlapping tracts. A tract is the part of a holding which lies within the boundaries of the segment, or the land of the segment not part of any agricultural holding. A tract is determined by the boundaries of a segment and by the holdings with land in the segment. The definition of tract is not necessary for area sample designs in which the segments coincide with the land of agricultural holdings (type III above).
A tract within a segment is often divided into a number of fields which have recognizable boundaries and in which the land use is different.
A holding is composed of one or more tracts. All land under the operating arrangement, non-agricultural land, farmstead, barns, corrals, pasture, ponds, etc., are included in the tract. A questionnaire should be completed for each tract, except for those tracts that consist of wasteland, water, and non-holding land, that will merely be listed on a control sheet.
Stratification of Area Sample Designs
Most area sample surveys consider a subdivision of the frame into land-use strata. Stratum boundaries must consist of physical terrain features (roads, paths, rivers, etc.) that can be located in the ground.
The land-use strata are defined by proportion of cultivated land, predominance of certain crops, special agricultural practices, average size of cultivated fields, agro-urban areas, or other land-use characteristics. Unless otherwise stated, when referring to an area sample the word strata will be used to denote land-use strata.
Most area sample designs consider replicated sampling selection methods within substrata (zones) of the land-use strata, each with an equal number of segments. This provides a further stratification which is applied in order to improve the efficiency of the design.
Sample Selection Methods
A sample design of this classical type considers segments of nearly equal size in each land-use stratum. The area sample design usually consists of a stratified two-stage sample design using a random replicated sampling selection procedure. Stratified single-stage designs using a systematic replicated sampling selection procedure have also been used. Three sampling stages are used also for special strata in the above designs.
Each stratum is completely subdivided into non-overlapping Primary Sampling Units (PSUs) -Counting Units or Frame Units-, which are areas with recognizable physical boundaries formed by segments. The measure of size assigned to each PSU is equal to the number of segments it contains. Within each stratum, PSUs are ordered by similarity and then selected with probability proportional to size measure (PPS sampling). Then, within a selected PSU, segments are selected with equal probability (EPSEM).
The PSUs defined in such a way can be formed quickly and provide a means of identifying and counting all segments in each stratum and in the total frame without actually mapping each segment.
Once the PSUs are defined and their areas are measured then the exact size (number of target segments) of each stratum and of the total frame will be known. The PSUs are ordered by increasing order of similarity of agricultural characteristics, ordering that will ensure also a geographic distribution of the sample. In each stratum the PSUs facilitate the location of the sample of segments selected using a systematic or random selection of segments with equal probabilities (EPSEM). For identifying a selected segment it is only necessary to partition the corresponding PSU into a number of segments equal to its assigned number of segments.
For one-stage area sample designs, the PSUs as defined above are also known as counting units, and are also constructed to calculate the size of each stratum and of the total frame, introduce a further stratification given by their order, avoid mapping each segment, and facilitate the probability sampling of segments. For simplicity on notation, in any case CU's or frame units will be called PSUs.
In area sampling, for practical purposes, the ordered list of PSUs within each stratum, with their assigned measures of size -which are the number of segments in each PSU- is called the area sampling frame, even for one-stage area sample designs. As mentioned, the frame of PSUs allows to select the probability sample without actually listing ail segments.
Preparing an area frame and selecting a sample of segments for this classical type of area sample designs with segments that have identifiable physical boundaries is a demanding job but some of the work can be effectively done by applying the procedures to be described in this manual.
In an area sample design of this type, it is indispensable to distinguish between the sampling unit (segment), and the reporting unit (holding or tract) for each variable in order to define the corresponding estimator.
Since a given survey variable is defined in the reporting units (holdings or tracts), and the sampling units are the segments, in order to define the estimator, rules should be established to define a value of the variable in each segment as a function of its values in a group of the associated holdings or tracts.
Data Collection Procedures
As described in the Introduction, the field data collection for each tract of a segment is carried out by enumerators that complete a questionnaire through personal interviews with the agricultural holder. The data collection often involves also identification and measurement of agricultural areas undertaken with the help of an aerial photo enlargement (or a map or scale drawing) of the segment that is used to measure the identified agricultural areas.
Sample Selection and Estimation Methods
The usual sample design considered is a stratified sample of square segments selected with equal probability within each stratum. Square segments of equal area (the cells of a grid overlaid on the strata) are selected within each stratum.
As discussed in the description of the area samples of the Czech Republic and Spain, provided in volume II, the area frame construction and sample selection is basically simpler than the area frame construction and sample selection for an area sample design with segments that have recognizable physical boundaries.
The estimation methods for an area sample design with square segments are basically analogous to ones described above for an area sample design with segments that have identifiable physical boundaries
Data Collection Procedures
The field data collection for an area sample design with square segments is analogous to the data collection described above of an area sample design with segments that have permanent physical boundaries (Type I). However, reliable data collection for multiple-purpose area sample design with square segments is more difficult (and may be even impossible in practice) because of the difficulty of obtaining from holders reliable data for the tracts of a segment that cannot be observed on the ground.
The sample design can be considered a stratified sample of holdings selected with probability proportional to their areas. In this case, a grid is overlaid on the strata and a sample of points is selected. Then, the selected points are identified on the ground and the corresponding holding is selected for the area sample. The area frame construction and sample selection is simpler than the area frame construction and sample selection for an area sample design with segments that have recognizable physical boundaries.
The estimation methods for this type of area sample designs are simpler than those needed for the other two types of area sample designs. In fact, a one-to-one correspondence between reporting units (the holdings) and last-stage sampling units (the land of the holdings) is considered.
Data Collection Procedures
If objective measurement of areas is required (as is often the case in many countries), it can be accomplished by the enumerators by measuring the area of the holding and its fields during data collection, for a subsample of holdings -as it is undertaken for list sample surveys-. This procedure is more cumbersome and less precise than the area measurements undertaken in the office with the help of aerial photos, as commonly used with the other types of area sample designs.
A data collection procedure for the area sample design of Nicaragua, which does not involve objective measurement of areas, is described in volume II.
A distinguishing characteristic of area sample survey methods is that they have incorporated important technological advances in computer data processing to a larger extent than list sample methods. In fact, area sample methods can utilize satellite imagery or even digital satellite data as part of Geographic Information Systems (GIS), hand-held Geographic Positioning Systems (GPS) and generally a variety of automated procedures and techniques for sample selection and data analysis.
Geographic Information Systems are used in many countries for a variety of purposes related with agricultural statistics. A GIS called CASS (Computer-Aided Stratification and Sampling) is currently used in the United States for area frame construction and sample selection. These procedures are highly automated and the material requests, mainly mapping requests are difficult to meet in most countries. The methods are important because they are adopted at present in the biggest and oldest area frame project in the world and because they may represent the trend of the evolution of area frame construction and sample selection for manual to computerized procedures. In other words, these computerized methods may be partially or totally adapted to different conditions and requirements in other countries. As already mentioned, this manual does not present the use of GIS for area frame construction and sample selection, describing instead methods that require a minimum of resources and specialized staff taking into account the constraints faced by developing countries. However, in order to illustrate the use of GIS for area frame construction and sample selection, summaries of the area sample designs of the Czech Republic, Spain and the United States will be included in volume II.
National or large-scale agricultural area sample surveys have being applied and developed for many years in a large number of countries involving a diversity of conditions and requirements. The first nation-wide area sample was implemented some fifty years ago; cf. King and Jessen (1945) and since then, most important advances in procedures have taken place in connection with the development of the periodic national area frame agricultural survey in the United States and other countries.
General, classical presentations of area sampling methods as originated and used in the United States can be found in Houseman (1975), Vogel (1986), Fecso, Tortora and Vogel (1986), Cotter and Nealon (1987) and Cotter and Tomczak (1994).
An agricultural multiple frame sample survey is a probability sample survey that combines the two basic types of probability sampling methods, an area sample component with a list sample component.
Multiple frame estimators combine area sample with list sample estimators for each survey variable. If the list frame is sampled, the results obtained from the selected holdings are expanded and added to the area sample estimate. If the list frame is completely enumerated, the results obtained from the list are added to the area sample estimate with no additional contribution to the overall variance.
For multiple frame surveys, all holdings in the list frame must be removed from the area frame, an operation that requires special attention and resources. In other words, all tracts in the selected sample of segments, that correspond to holdings in the list frame, should not be considered to obtain the area sample estimates.
A list frame of special holdings is a necessary addition to an area sample in order to provide adequate estimates for important agricultural variables that have a highly skewed frequency distribution. As it is known, a number of important agricultural variables concentrate a significant proportion of the total estimate in a small proportion of the holdings. For each of these variables, the list sample should account for the skewness of its distribution. As a result, the corresponding multiple frame estimates will be more precise than the area sample estimates.
The list of special holdings, that should be updated annually prior to the survey field data collection, can be formed (as already mentioned for the case of list sample designs) by those holdings with the largest area for a given crop, those with the largest number of livestock, or those that may not be duly represented in an area sample. The annual or seasonal data collection for the list of special holdings can be done by direct interview with holders or managers, or by leaving the questionnaires with the holder and collecting it afterwards, or data collection may be even done by using the mail if the list sample is relatively short and therefore easy to control.
The list frame of a multiple frame survey can be a large, nationwide list of holdings. The preparation and updating of such frame requires a heavy investment in computer hardware and software and a very controlled field operation for its use in combination with the area sample. In the USA and Canada, for example, the multiple frame survey designs combine a large, nationwide list sample with an area sample. However, such type of multiple frame design, although the most efficient, is not feasible in most developing countries.
On the other hand, the most practical multiple frame methods for developing countries are those with a relatively short list of special holdings, to be completely enumerated, used as a complement to the area sample. This is the case, for example, of the multiple frame survey design described in the following chapters of this manual.
The choice of an appropriate current agricultural survey design can best be made by considering the comparative advantages, disadvantages and requirements among multiple frame survey designs, area sample designs and list sample designs.
A distinction is made between surveys that require or not objective measurement of agricultural areas. It should be mentioned that in many countries or large areas of developing countries the areas reported by farmers are considered not reliable and therefore objective measurements of areas are required to obtain reliable estimates. This need arises partly on account of all kinds of arbitrary local units of measurements that are in use in different parts of the same country, and partly because of the general tendency among farmers to under report their areas and production.
It will be assumed that the list sample design is of the common type already described in section 2.2.
Multiple Frame Designs vis-a-vis Area Sample Designs
* A multiple frame design that combines an area sample with a relatively short list of special holdings which make a significant contribution to the total estimate of some important survey variables is preferable to the area sample design alone, since it can provide more accurate estimates of such variables, and because the extra work involved for its design and implementation will generally not be significant.
Multiple Frame Designs vis-a-vis List Sample Designs
When referring to an area sample it will be assumed here to be the area sample component of a multiple frame design that includes also a list frame of special holdings; and also that, unless otherwise stated, the segments have identifiable physical boundaries.
Some of the advantages, disadvantages and requirements of such multiple frame designs of current agricultural survey programmes as compared with list sample designs are the following:
* Coverage Errors Related to Complete Coverage. The area frame provides complete coverage of the population of reporting units (holdings or tracts). The target population coincides with the population of interest. Coverage errors are a major problem in list sampling, but not in area sampling provided the rules of association linking reporting units with selected segments are performed correctly. Therefore, probability area sample estimates are not biased concerning coverage errors. This cannot be achieved with a list sample survey since, in practice, a complete list of holdings, valid as of the date of the survey data collection period, cannot be established.
* Coverage Errors Related to Repeated Use of the Frame. If an annual or seasonal agricultural survey is to be implemented, it is worth noticing that an area frame is generally far more durable than a list frame of holdings. An area frame does not become outdated for many years (say 5 to 15 years) unless the population extends into areas not covered by the frame. Arrangements must be made to keep the frame up-to-date in areas where the proportion of cultivated land (the stratification criteria used) is changing rapidly as in areas of new urbanization, or to cover new agricultural areas. Changes in land use, or in the number and location of holdings may reduce the precision of the area sample estimates but they do not introduce bias.
* Precision of the Estimates. An area sample design with segments that have recognizable physical boundaries or one with square segments, obtain more precise estimates of agricultural areas (a key variable studied in all agricultural surveys) than a list sample. In fact, by definition, in area sampling the probabilities of selection and the sampling expansion factors are proportional to agricultural areas.
* Non-sampling Errors and Objective Measurement of Areas. If objective measurement of agricultural areas is required, for the area sample design non-sampling errors associated with area measurements are reduced by the use of aerial photographs of the selected segments that clearly indicate the holdings and the fields and is used to check reported area of fields and total area of the holding itself. The main crop areas are identified and delineated on aerial photographs or detailed maps during the field data collection and then measured in the office. The holder is more inclined to be truthful when confronted with questions about specific portions of his holding that are also being observed by the enumerator at that moment. The area sample design involves a more convenient and accurate procedure for the objective measurements of agricultural areas, which is an important advantage for providing accurate area estimates. For list sample designs the measurement of areas is usually undertaken only in a subsample of holdings during data collection. This procedure is generally slow and cumbersome to apply, and impractical when dealing with holdings formed by parcels a long distance from one another.
* Basis for a Crop Cutting Yield Survey. An area sample design with segments that have recognizable physical boundaries or one with square segments, provides the means for selecting a statistical sample of fields needed to conduct crop cutting yield surveys in order to estimate crop production when crops mature and make crop forecasting yield estimates by measuring plant characteristics at certain stages of growth during the crop year. Such a sample of fields, obtained from the area sample, can also be used by trained enumerators to make eye-estimates of crop yields. In developing countries, holders are often not able to report reliable estimates of crop yield and production and there is the problem of local measures which often vary from village to village or even from holder to holder. Crop-cutting methods are recommended to collect objective estimates of yield although they are time consuming, expensive and require strict training and supervision of enumerators for their adequate implementation.
* Sample Size. When comparing the sample size of a multiple frame design with a list sample design a larger sample may be necessary due to the between cluster and within cluster variances. The probabilities of selection of the PSUs, and of the initial stages of the sampling design, cannot often be properly established for an efficient sample design.
* Boundaries of PSUs. For a list sample design, it is often not easy (sometimes impossible) to establish the boundaries of the PSUs whether they are villages, enumeration areas or administrative subdivisions.
* Changing System of Holdings. The area frame construction is independent of agricultural holdings. This makes area sampling an appropriate solution for example, for countries or areas where the system of holdings and other agricultural infrastructure has been altered drastically and will be in a state of flux well into the future, and for which the knowledge of agricultural production is more important than obtaining results by categories of holdings (e.g. countries of Central and Eastern Europe).
* Data Collection Costs. Area sample surveys are cost effective on a per holding basis since each segment contains a group or cluster of tracts (reporting units). If a list sample is not a cluster sample of holdings, the holdings to be surveyed by an enumerator could be separated by a significant distance consequently increasing survey costs.
Disadvantages and Requirements
* Cartographic Requirements. Constructing an area frame requires cartographic material upon which accurate identification of areas and land measurements can be made. Accurate maps and preferably satellite images or a mosaic of aerial photography, as well as instruments for area measurement and for transferring maps to different scales (planimeters and preferably computerized equipment) are needed for preparing a stratified area frame and select an area sample. Aerial photographic enlargements, or accurate drawings of a known scale for each selected segment are a great advantage if objective measurement of areas is required. In this case, a tremendous advantage, but not a requirement, is the ability to take aerial photographs for selected land areas. The increasing availability in recent years of satellite images, computerized area measurement and scale-transfer instruments and Geographic Information Systems greatly facilitate the application of area sample methods. In addition, hand held Global Positioning Systems (GPS) provide a practical tool to locate and/or define segments. In particular, the availability of satellite images has effectively removed one of the previous problems for area frame construction which was lack of aerial photography and maps for stratification purposes. Such a variety of cartographic materials and instruments are not required for list sample surveys.
* Lack of permanent physical boundaries. For an area sample design with segments that have recognizable physical boundaries or with square segments, the lack of permanent boundaries in the maps, satellite images and aerial photos constitute a serious problem. In tropical areas such as West Africa, for example, because of the climatic conditions and shifting cultivation systems, boundaries change more frequently or get covered by bush and are not visible on the cartographic materials.
* Proximity of the Holder or Respondent to the Holding. It may not be feasible or even possible to use an area sample in some countries due to difficult terrain, lack of funding or due to certain social mores of the rural population. In a number of countries, or large regions of countries, holders live in villages often some distance from their holding. In this case, a survey based on an area sample of segments is difficult to implement. For a given selected segment far from a village, it will be problematic to identify and interview the holders with holdings partially or totally included in the segment. And even when they can be located, the holders will tend to report for their entire holding and not for the part of the holding included in the segment. For these situations, surveys without objective measurements of agricultural areas are used. If an area sample design with segments that have recognizable physical boundaries, or one with square segments is used, then the stratification can be modified to place emphasis on the villages, and the area of the segment is not taken into account except to define the holders who reside within the segment (open segment estimation procedure).
* Sample Selection Costs. There are higher costs involved at the start of the survey programme, for selection of an area sample of segments with recognizable physical boundaries, than those needed for a list sample. However, for a current agricultural survey programme, a list frame of holdings needs to be updated frequently in order to control a proper covering and obtain reliable estimates; and therefore the resources needed for an alternative area sample may be easily justified.
* Technical Expertise. The implementation of a sustainable agricultural survey programme based on area sampling methods requires a dedicated, highly qualified office staff that is willing to accept the tedium and precise attention to detail indispensable in the construction and maintenance of the area frame. It requires highly trained statisticians for data analysis, for the interpretation of results and corresponding necessary adjustments and refinement of survey procedures. The frame and the sample will be no more durable than the staff. This type of staff is not easy to hire and keep motivated in civil service conditions and low agricultural budgets of a developing country. An area frame can be built and an area sample can be selected with technical assistance from various sources, but its continued existence and usefulness will depend on strong government support. The implementation of a an area sample design require more technical expertise than the implementation of a list sample design. Where it is possible to construct an accurate list frame (a list frame is often incomplete, biased and outdated), the advantage of a list sample design over an area sample design concerning simplicity of implementation is partly due to the fact that the estimation methods are simpler, since there is usually a one-to-one correspondence between sampling and reporting units. However, for an area sample with segments that coincide with the area of the holdings, the technical expertise required is approximately similar and there is also a one-to-one correspondence between sampling and reporting units.