|Multiple frame agricultural surveys. v. 1: Current surveys based on area and list sampling methods. (FAO Statistical Development Series - 7) (1996)|
|PART II: AREA SAMPLE DESIGNS WITH SEGMENTS THAT HAVE IDENTIFIABLE PHYSICAL BOUNDARIES|
|CHAPTER 9 - DATA PROCESSING AND ANALYSIS OF SURVEY RESULTS|
|9.3 Summary of Results. Data Analysis|
An outlier is a data value which differs considerably from the average of the other data values in the set (perhaps beyond 3 standard errors above or below the mean) and often contributes a disproportionate amount to the direct expansion estimate and its variance. Lists of large holdings and specialized and localized producers must be developed prior to any large scale area frame survey for the precise purpose of properly accounting for units that will be outliers if they should be encountered in the area sample. Holders' names from the area sample are checked against these lists and any overlap with the lists are removed.
Even small lists can be incomplete and list checking is a tedious task so some area sample questionnaires will contain outlier data even after checking against the lists. Proper checking and editing prior to data entry should identify many of the possible outliers before they get into the summary process. Once into the summary, outliers may be best detected by frequency distributions or by examining stratum totals and variances, replicate totals and geographic totals and variances. A higher variance or expansion will indicate the data grouping that should be examined further either by the computer or manually to find the outliers.
One way of dealing with outliers is to remove them from the direct expansion; making a direct expansion without them. Then add them back in to the final expansion at the level of representation they would have had if they had been on a special list. This would be done where the outlier is an extremely large report and it has been possible to get most of the potential outliers on a list.
Where the condition of being an outlier is caused by large expansion factors, the outliers are removed from the direct expansion and then put back in with a weight that is the value the unit should have received if it had been classified correctly.
In all cases, the estimate and the sampling error with outliers present should be computed. After adjustments for the outliers have been made, new estimates and sampling errors should be computed. These two estimates- unadjusted and the adjusted - need to be evaluated together when determining the final estimate. If the ranges of the sampling errors of the adjusted and unadjusted estimates overlap, the area of overlap represents a compromise between the two. Remember that even though the outlier will probably cause the initial estimate to be too large, the adjustment procedure may cause an under estimate; cf. Vogel (1986). This possibility of an underestimate is particularly important in cases where knowledge about the number of large holdings is incomplete.
In areas of heterogeneous agricultural activity, the detection of outliers is extremely important to the accuracy and precision of the estimates. Care should be taken by the statisticians to analyze the data for potential outliers.