| The archival appraisal of records containing personal information: A RAMP study with guidelines |
|4. Appraisal methodologies, criteria, and options|
1. This chapter will outline methods and criteria, as well as practical approaches, that an archivist should use in appraising records containing personal information. Such appraisal operates on two levels:
-- for the large majority of case file series which have value primarily for their collective or evidential character (and thus have been identified for further appraisal using the macro-appraisal model in Chapter 3); and
-- for those series which only contain certain informational value about specific persons, places, and things.
The chapter begins by recommending a comprehensive approach as the only logical way to make sound appraisal decisions. This is followed by general and specific appraisal criteria, the latter both for special files in a series (informational value) and for series as a whole (evidential value, leading to the broader sharpening of the societal image). Various practical and preservation issues are next addressed, and then various appraisal options, including a brief summary of sampling, are presented.
2. Archivists should appraise series of records containing personal information as part of a larger information universe.1 Not to do so is to start at the bottom of the records pyramid with the most voluminous and repetitive records having the least value, rather than at the top with the policy files and then in the middle with subject files. The archivist must consider the value of records created by the formulation of policy and then those resulting from its general operations, interpretation, and modifications (as revealed in policy and subject records). before being able to understand and appraise correctly the records generated by the daily implementation of the policy (as revealed by the case files). When dealing with the case files themselves, the archivist should consider first the societal image model, and then later issues of informational value separate from the collective and evidential value of the series.
3. As indicated in Chapter 3, moreover, records containing personal information must by definition be appraised against a wider background, especially where such records have their principal significance because for a given function they enrich the image of the citizen-state interaction. That image will be sharpest, and thus most worthy of documentation by the archivist, where there is evidence of significant changes, variations, and distortions between targets and results of the given programme and where the agency allows the citizen sufficient latitude to express his or her opinions.2 Thus, ipso facto, the archivist must determine the operating culture of that programme and agency by looking first at the sources which reveal it:
-- policy and subject files whose importance to the image was explained in the last chapter;
-- electronic records which aggregate much more precisely the more amorphous information from the case files and make clear the relevant demographic and statistical patterns;
-- central government sources (budgets, audits, inquiries, reports, and so on);
-- procedural and forms manuals;
-- legislation; and
-- related published and near-published information.
After determining which policy and subject records and which electronic data bases will be preserved by the archives, as well as the availability of other relevant (but non-archival) information, only then will the archivist be able to assess both the sharpness of the image in the citizen-state interaction and the value of the connected case records containing personal information. Finally, as noted, if the case file series do not have value in sharpening the collective societal image, they should still be appraised, as a last step, for the informational value they may have about specific individuals and events and places. In this comprehensive approach, therefore, actually looking at the records containing personal information is, ironically, the last rather than the first step in appraising such records. To look at personal information case files in isolation from these other factors and these other records is a prescription for poor archival appraisal.
4. The implementation of this comprehensive appraisal framework may conflict, however, with the priorities of government agencies in scheduling their own records. The records schedule is a timetable created by a records manager indicating how long files or groups of files should be retained, where they should be retained (agency or records centre), and their ultimate disposition (transfer to an archives or destruction). If archivists can record their appraisal decisions onto records schedules as part of the process whereby those schedules are approved, the result is a more efficient and economical disposal process, reducing considerably the work of archivists and ensuring in all likelihood that valuable records are not lost or inadvertently destroyed. In this process, it has traditionally been the case that all relevant records in all media for a particular programme or administrative sub-unit are not be scheduled comprehensively by the agency, although that is desirable and should be encouraged. Usually, however, the bulky records containing personal information, which often have the shortest retention periods, will be scheduled first, simply because the agency does not want the high storage costs of maintaining them for long periods of time. But that does not mean that these records must be appraised first in isolation: the schedule is merely a tool to record an appraisal decision, among its other functions. The archivist should thus appraise in the comprehensive context (as outlined above) all the series and media of records created in a particular office relevant to assessing the image model, even if only a small portion of the records are being formally scheduled at any one time.
5. In the same comprehensive approach to their work, archivists should consider adopting the "cluster concept" when they appraise records. If there are several interrelated series of personal information records -- military records involving individuals might include the personnel file, court martial files, burial files, and so on -- these should be appraised together so that overlapping information may be more readily identified and thus a better appraisal made. The same clustering occurs in immigration and naturalization files and in certain court records.3
6. The timing of the appraisal of series of case files or similar personal information records will vary. For the essential records category, the decision can be made immediately. For more routine and homogeneous series, in a manner similar to appraising electronic records at the system design stage, the records may be appraised as (or even before) they are first created. Here a diplomatic analysis is necessary to understand the form and process and structure behind the records per se, which in turn will reveal much about the informational content before the records are even created. In such cases, archivists must monitor the situation periodically to decide if changes in the programme, agency, or records structures over time require a revised appraisal. But for most series containing personal information case files, the issue of retention will revolve around whether the records reflect a societal image which distorts, alters, or negates an articulated intent of the programme or agency. Almost by definition, such cases will involve public issues or government functions which are controversial, hotly debated in public forums, and emotion-laden for many citizens (including archivists) at the time of their occurrence. In such cases, distance adds needed perspective to the. appraisal decision. That time can be gained by storing records for periods of infrequent use in records centres, but archivists must guard against excessive use of this strategy in order to avoid filling centres with useless records with ever-mounting storage costs. Records centre storage is not justified simply because records creators, records managers, and archivists refuse to make difficult decisions. Good archival research and analysis, however, will shorten the needed "cooling-off" period for records. Even if there is delay in making the actual appraisal until this perspective has been gained, the archivist should still gather relevant documentation and interview responsible departmental officers as soon as possible, before both disappear and important experience and impressions are lost.
7. The comprehensive approach to appraisal, as well as the reorientation of the archivist from passive receptor to active selector and the ideal timing of the appraisal, may sometimes be in conflict with the aims of records managers with whom the archivist must cooperate. All archivists have had the experience of roomsful of records dumped on them without warning, thus undermining any chance to treat such records comprehensively with the others in their information universe or actively in terms of isolating (according to Chapter 3) the key records worth preserving in order to retain the most faithful image of society. There is no easy solution to this dilemma, but since the volume of records ever increases, and as space and other resources diminish for both records managers in departments and archivists, it is mandatory that archivists break this vicious circle and regain control of the archival agenda. That may be done by implementing a planned, strategic approach to records scheduling with agencies, that is, a plan based on archival priorities derived from research into all the complex variables mentioned in Chapters 2 and 3, but which also recognizes the need of agencies to have authority (usually received from the national archivist in most countries) to destroy records without archival value in a timely fashion.4
8. The main working rule for archivists in appraising all records is to destroy them.5 On the standard 5 per cent: 95 per cent ratio of keep versus destroy -- which in many countries is closer to 2:98 or 1:99 -- "it is neither scientifically desirable nor economically defensible to spend most of our time and energy on minutely culling the larger mass of records".6 If this is true for all records, then it is especially so for large series of case files containing personal information. The focus for the archivist should not be to explain what is being destroyed, but rather to advance "definite and compelling justifications" for what is being kept. Of that 5 per cent of paper records so selected, many will be policy or significant subject files, but a large volume may be textual case files in paper format (especially where there are strong pressures to save records for their informational values).7
9. Essential records containing personal information, as outlined in Chapter 2, sections 22 and 23, are not covered by these rules or by the theoretical considerations of Chapter 3. Such essential categories of records are not acquired by archives to sharpen the societal image (although they obviously contribute to it), but rather to provide a demographic profile of the nation, to protect citizens' rights, and to underpin certain judicial processes.
10. Personnel records similarly are acquired both in their own right for the evidential and informational value they have in isolation (see Chapter 2, sections 24 to 28), as well as for how they may contribute to the overall societal image.
11. Where the principal value of the important personal information records coming from the "macro-analysis" of the societal image is determined by the archivist to be collective and quantitative rather than personal and qualitative, the record should usually be kept in electronic rather than paper format, where both exist. The advantages of the machine-readable version of the record are numerous, in addition to obvious savings in space and storage costs: manipulability of the information, ease of anonymization permitting public access in light of tougher privacy laws, linkage to other data to create "new" information bases and potential for aggregation and statistical analysis.8 As noted before, this rule may apply to essential personal information records as well, and it does apply to personnel records series.
12. As outlined in the last chapter, records containing personal information should not be kept to document the historical significance of a programme or an agency per se (as opposed to the concept of sharpening the societal image). The only exception is that a small example of case files may sometimes be kept to demonstrate the forms used where the programme was of particular importance. Keeping large examples or more formal samples merely to show the processes of the agency or the nature of its daily operations is rarely justifiable. Information on processes and operations, as noted, is readily available elsewhere in the information universe and records hierarchy. This rule also includes attachments to or associated artifacts connected with case files, such as X-rays, fingerprints, weapons, blood samples, and so on.
13. The primary use of records must not be confused with their secondary, archival uses, although the nature of the primary use is clearly important to understanding the records' context during the appraisal process. Simply because a department has a long-term and sometimes even a permanent use for a record does not render such case files an archival record, unless they also have significance in terms of the societal model in Chapter 3 or of informational value for research. It may be that political pressure, as noted before, may in some countries require such records to be stored in the national archives, but that is a pragmatic decision, not one based on archival significance.
14. In addition to researching and understanding all the factors and variables outlined in the last two chapters, archivists must ensure that they do not give undue weight to various types of records. They cannot appraise a large series of case files by "spot-checking" or by accepting the word of the agency's officials that various records are duplicated in other series and/or in other levels of the administrative hierarchy. Archivists must approach the task more comprehensively and scientifically. In appraising 135,000 cubic feet of Department of Justice litigation case files in the United States, for example, archivists followed the department's own classification system to break the cases into 194 distinct categories (kidnapping to insurance fraud) and then used a consistent sampling methodology to select a balanced number of files from each category for study during the appraisal process. This is sampling for appraisal rather than for acquisition and transfer. As the number of cases ranged from over 10,000 in each of anti-trust, land, and taxation categories to under 10 for those relating to misuse of insignia, census violations, or farm loans, such scientific categorization and sampling is necessary in order to understand the nature of the records involved and to ensure that cases with few instances are not overlooked and those with many are not overemphasized. The Department of Justice methodology is not only directly relevant to the personal case file series of other judicial, court, police, and intelligence agencies, but also to any series which on the surface appears to be homogeneous, but which in reality has various internal categories or functions.9
15. There is no attempt in this section to write guidelines, complete with full argument and examples, concerning archival appraisal criteria in general.10 Rather, only those factors affecting the appraisal of records containing personal information are summarized. If a series of case files following the "macro-appraisal" model is determined to have potential permanent value, then a secondary series of appraisal factors must thereafter be considered. As well, series which do not have such collective or evidential value in sharpening the image of society may still have informational value. The following paragraphs outline appraisal factors for both cases: identifying within a series the individual files that have particular significance (informational value) and dealing with all cases in the series as a collective reflection of the citizen-state interaction (the macro-appraisal model). The first involves pulling special cases away from the whole; the second involves sampling (where all files need not -- or cannot -- be kept) to ensure that the part retained in an archives is a valid representation or reflection of the whole.
16. Series as a Whole. Without denying the importance of the exceptional and controversial cases within a case file series, or the general informational value of the series, it is the series as a whole, as an aggregate of the citizen-state relationship, that should first draw the archivist's attention. In such cases, after determining that the series does indeed qualify as sharpening the image of society as outlined in Chapter 3, the archivist evaluating the series of records containing personal information must address a number of additional factors, which are common to all appraisal:
a. Completeness of the series. The more complete a series is, including both successful and unsuccessful cases, regional and headquarters input, the greater its value.
b. Authenticity. There must be assurance the records are genuine, created in the normal course of business under established procedures, and clearly linked by provenance to their creator.
c. Uniqueness. Is the record physically duplicated in whole or large part in electronic, micrographic, or published form? Is significant information from it tabulated, summarized, or abstracted in policy and subject files, data bases, or publications? If so, as noted before, the paper version of the case file should rarely be acquired by an archives. If the record or information is unique, does it merely confirm impressions already recorded elsewhere, does it supplement what it known, or does it provide a fresh, untapped body of data?
d. Relationship to other records. If the records complement or extend the understanding or significance of other records in the archives' custody, their value increases. Similarly, the potential to link records or data between these and other records must be considered.
e. Dates and time-span. The earlier the date of the series, especially for pre-1945 case records when other personal information sources were less available, the more value the series may have. Similarly, for comparative and longitudinal studies, the longer period of time covered by a series of records containing personal information, the greater their value.
f. Extent. Obviously the overall existing volume of the series, and the annual rate of accumulation, must be considered.
g. Usability. The records must be legible, coherent, accompanied by relevant supporting documentation, and arranged or indexed in a manner rendering them usable by researchers, or have the potential to be made so.
h. Rigidity/Flexibility. As noted at length in Chapter 3, the series has greatest value if its structure (and the programme and agency behind it) allows information from citizens to be recorded directly rather than indirectly, in free prose rather than set forms, and reflecting views and opinions rather than merely the rigid application of fixed procedures.
17. It may be useful for archivists to consider drafting an appraisal checklist of questions when appraising records containing personal information. As noted before, this must be used after extensive research by the archivist into the history and character of the records and their creator and after undertaking the "macro-appraisal" outlined in Chapter 3. Naturally, the questions asked will vary from agency to agency in light of their particular mandates and functions. One such checklist, which includes most if not all of the above points, was produced from the National Archives and Records Administration's celebrated appraisal of the FBI case files, and it is reprinted in the Appendix to this study.
18. Finally, following from the insights of the documentation strategy, archivists faced with appraising large volumes of records containing personal information should develop national networks to ensure that the records being appraised in their own jurisdictions, using the foregoing criteria, are indeed unique, and that the information in them is not reflected or even duplicated in similar records being retained in other archival repositories.11 Extending the argument beyond the contents of other archives, archivists should remember that the societal image is reflected and preserved as well by librarians, museum and art gallery curators, historic site interpreters, recorders of oral history, and many other heritage professionals. The results of their work may well allow the archivist to destroy certain types of information rendered thereby less essential to the overall image. Such national networks will not be easy to establish nor will they always work harmoniously, but the need for them is apparent and the effort should be made.
19. Special Cases within a Series. There is no better way to introduce the appraisal of series for informational value than by quoting the admirable guidelines set forward in a National Archives and Records Administration handbook for appraising selected case files for permanent retention:
Those chosen normally fall under one or more of the following categories. The case: a. Established a precedent and therefore resulted in a major policy or procedural change; b. Was involved in extensive litigation; c. Received wide-spread attention from the news media; d. Was widely recognized for its uniqueness by established authorities outside the Government; e. Was reviewed at length in the agency's annual report to the Congress; or f. Was selected to document agency procedures rather than to capture information relating to the subject of the individual file. Categories a. through e. establish the exceptional nature of a particular case file while category f. relates to routine files chosen because they exemplify the policies and procedures of the creating agency. The types of case files selected for permanent retention under the criteria established above include, but are not limited to, research grants awarded for studies: research and development projects; investigative, enforcement, and litigation case files; social service and welfare case files; labour relations case files; case files related to the development of natural resources and the preservation of historic studies [sites?]; public works case files; and Federal court case files.12
As noted earlier, category f. (evidential value) should only be used very sparingly for records containing personal information, and should never exceed a small sample. However, unless the creating agency is willing to code the files physically (numbering variation, colour tabs, a cover stamp or annotation) to indicate that any particular case file was indeed exceptional and falls into categories a. to e. above, there is little chance that the archivist will be able to isolate such files using these categories --especially if there are hundreds of thousands of file units.
20. There are three alternatives. One is to isolate important instances by date: military records during wartime years, immigration records during years of special migrations or forced evacuations, whether globally or by particular countries; all files created during the pioneering, early, or controversial periods of a particular programme. A second alternative, and one following the extended example of personnel case files cited in Chapter 2, is focusing on certain levels or categories of individuals, where such hierarchical organization exists and is easily evident in the filing system used for the records. A third and more assured alternative is concentrating on the "fat file" -- or the multi-section or multi-volume file.13 As exceptional, unusual, or controversial cases almost by definition generate more correspondence than their routine counterparts, such files will be thick and thus easily identifiable even in vast series to be pulled for archival retention. Of course, not all thick files necessarily follow this pattern: it may be that someone was routinely repaying a loan in monthly payments over thirty years (thus generating a fat file of 360 receipts). The archivist will have to assess the reasons for the thickness of particular files in series where they occur to ensure that such files are indeed exceptional. It is also logical that such exceptional files may well contain all which the archivist feels is necessary to document the "hot spots" in the citizen-state dialectic. After all, such controversial and precedent-setting files by their nature represent the "image" forcing changes on the programme and agency intentions and targets (see systematic sampling in section 32 below). In certain situations, the archivist may also want to select for preservation a "normal" base of information against which these special and exceptional cases may be compared and contrasted by researchers.
21. There are certain practical and preservation factors which may affect the ultimate decision to acquire a series of records containing personal information, or to acquire only part rather than all of it. The archivist must consider these factors, but only after going through the four steps of researching and applying the "macro-appraisal," the comprehensive analysis, the general rules, and the specific criteria noted above. If after that process, the archivist has made a positive decision that the records containing personal information indeed have value and that some or all of them should be acquired, then the following issues must also be considered. These are not appraisal issues, however, but preservation ones. The distinction is subtle, and important. While these practical and preservation factors clearly affect the nature (and sometimes even the possibility) of the actual acquisition of the records, they do not affect per se the intellectual decision of whether or not the records have permanent value. For example, the billions of bytes of climatic data received daily from hundreds of satellites and earth sensor stations have permanent value for long-term ecological study, but no archives is equipped to acquire them directly. Thus, such data are appraised as being permanently valuable, but practical or preservation issues prevent their actual transfer to an archives.
22. The most obvious practical factor is the cost of retaining the records. There are the obvious costs of the space needed to store the records and the containers and shelving to hold them. Less visible but equally pertinent costs concern the salary time and materials needed to arrange and describe the records, to preserve (and possibly copy) them, and to make them available to the public. In preserving the best possible record within budgetary limitations, many archives will find that they cannot keep all desirable series of case files and some hard choices will have to be made. If the foregoing analysis has been carefully researched and documented, and if it is sufficiently comprehensive across an agency (as defined in sections 2 to 7 in this chapter), then the priorities between competing series will be easier to identify for making that final decision.
23. There are sometimes legislative or statutory prohibitions which prevent archivists from viewing certain series of records in order to appraise them or which legally bar the transfer of certain categories of records to the archives, or both. In such cases, archivists (and their outside supporting communities) must lobby for legislative amendments or administrative arrangements to overcome these prohibitions. Ideally, archival legislation itself grants archivists the right to appraise and acquire even sensitive records.
24. Where the original paper records are too extensive, it is possible to convert the information, or the key portions of it, to electronic data, microfilm, or optical disk. Unless this conversion has been done by the creating agency, most archives will find that the conversion costs outweigh the storage ones, and only a small portion of their holdings will be so treated. Even where microfilm or computer versions of the record are available, however, archivists are cautioned that such miniaturization is no substitute for sound appraisal. Keeping useless records, even those with modest space requirements, complicates description and research unnecessarily, and clutters the desired total image of society.
25. Certain practical considerations may affect the timing of the transfer of records containing personal information, although again this has little to do with appraisal as contrasted to acquisition. If records are still actively used in an agency or likely to be subject to many freedom of information or privacy requests, it may be desirable to extend their formal retention period and thus delay their transfer to the archives in order to avoid excessive reference workloads. Conversely, if the archivist fears that records are physically threatened with either outright destruction or rearrangement that would obliterate their original order, then the retention periods should be shortened and the records safeguarded in the archives as soon as possible. This second scenario occurs when records are highly sensitive or embarrassing to the government in power or where the agency is transitory in nature (an investigatory commission, a small bureau) and about to pass out of existence.
26. Another preservation strategy may be to share between several archives in a formal network (or even informally across national borders) a large series of records which is beyond the capacity of any one of them to retain as a whole. In Great Britain, for example, over 300,000 feet of shipping and seamen's records (as of 1954 only) were handled as follows: the Public Record Office kept all crew lists up to-3860 and a 10 per cent sample thereafter, together with the crew lists for certain well-known ships; a sample of the remaining lists for every tenth year was then preserved by the National Maritime Museum; certain crew lists were handed over to local archives (for ships registered at ports within the area); and the very large residue was transferred to the Atlantic Canada Shipping Project of Memorial University of Newfoundland.14 While purists might argue about original order, or that this approach simply evades a tougher appraisal decision, researchers are pleased with this more generous solution which was beyond the capability of the Public Record Office itself.
27. Finally, as mentioned in Chapter l, more restrictive privacy acts coming into force around the world place very tight restrictions on the release of records containing personal information to third parties. These acts can dictate that certain types of personal information not be collected at all or can require that records containing it be destroyed after a short period of primary use, or can preclude transfer of such records to the archives (see legislative restraints above), or can make it impossible for researchers to use the records even if they are transferred. Archives must learn to live in this new environment by demonstrating to their sponsoring governments that they will follow the acts and not release personal information records in ways that will harm an individuals' rights to privacy. Archivists should also try to convince their sponsors and legislators that the sensitivity of personal information will eventually expire, and in such circumstances the personal information records can then be made available for public consultation. Therefore, such records should not now be destroyed merely to protect personal privacy. In such a manner, it is possible, as in the Canadian Privacy Act, to get specific exemptions for archival use of records containing personal information.15 In a related vein, by keeping a citizen's sensitive file as part of a sample, especially depending on the political climate of the country, archives may actually disadvantage that citizen and leave him or her open to prosecution, public embarrassment, or worse. It is essential that records containing such highly sensitive personal information kept solely for their collective significance in a series not be made available through descriptive tools which allow retrieval by a personal identifier for any purpose, including genealogical research, during the person's lifetime. As well, care must be taken to store (and destroy, if relevant) such sensitive personal information in a secure manner.
28. After all the above analytical steps, the archivist is faced with making one of the following decisions:
a. Retain all records permanently. Very few personal information records aside from the "essential" categories defined in Chapter 2 will be kept in their entirety. Perhaps a small series of case files in a programme where cases were appealed to and settled in the minister's office would be an example, or the examples cited before (see Chapter 2.30) of national gallery artist files or senior scientists' research grants. As a working rule in such cases, for interrelated series of records, it is preferable to keep all of a small series rather than samples from a much larger one.
b. Remove and keep key documents only from the files. Immigration landing record forms or medical and employment history charts once removed from the case files render what remains behind unarchival. It is, however, a labour-intensive job to remove such documents for large series if this work is not already performed by the creating department in the course of its normal business. It is good records management practice tin which archivists have an obvious stake) to ensure that key forms can be readily separated from ephemeral material.
c. Sample the records. Sampling permits the retention of the characteristics of the whole, both physically and intellectually, in a small portion of the whole. See the special section on sampling which follows for more information.
d. Take an example of the records. This involves taking a very small specimen (a file or box per year perhaps) solely to show the forms and processes used. As noted, there are better ways to document the evidential value of a programme, and this method for voluminous records containing personal information should be used sparingly.
e. Destroy all the records. This will of course be the decision taken for most series of personal case files created by modern governments.
29. At any stage in this process, archivists can consider converting the records or key information in them to electronic, micrographic, or optical disk formats as an alternative to collecting extensive series of bulky paper records, or consider alienating the records to another repository, but as noted these are preservation options, not appraisal ones. The personal information records in such instances have already been appraised as having permanent value before the practical and preservation concerns of actual transfer and acquisition are considered.
30. It is not the purpose of this study to investigate the various sampling methodologies in detail nor to review actual relevant sampling cases from archival practice. A RAMP study has been published on sampling and readers should consult it for more details and particular examples.16 The aim here is merely to give a brief summary of sampling as an appraisal option for acquiring records containing personal information. Some will argue that only random or statistical sampling is true sampling and that the other means cited below are better termed selection. However, unlike the example, they all attempt to represent some or all of the characteristics of the whole (or of some feature of the whole) in the part chosen, and for this reason are here termed "sampling." While there are many sampling methods, most usually fit *to one (or a combination) of the following four categories.17
31. Statistical Sampling. Selection based on mathematical techniques that determine the proper number of cases (i.e., size of the sample) and the actual means of selecting specific cases necessary to preserve a "representative" (statistically valid) sample of the entire series. This is sometimes called probability sampling.
Example: Selection based on random number tables, or an automated random number generator, and then pulling the required files matching the randomly identified numbers. There are three types of statistical or random sampling: simple random (where the random numbers are applied blindly to the entire population, which sometimes means small pockets of files of a particular type may be missed entirely); systematic random (where the first number is chosen randomly, and then every nth number thereafter is chosen, which is particularly helpful for chronologically organized series and for avoiding the "missing pockets" syndrome); and stratified random (where the whole is broken down into logical parts - like the categories in the United States Justice litigation case files cited above - and then each part or office is randomly sampled, thus ensuring that no part is overlooked).18
Advantages: The sample can be used to reconstruct the whole and the results should be statistically valid. It is theoretically unbiased and thus easily explainable to researchers. For a numerical arrangement of files, it may be a relatively easy sample to pull by clerical staff. Finally, archivists can control the size of the sample, and normally it will be quite manageable, since even for large series, the proper statistical weight can be assigned, even when a relatively small sample is chosen (about a maximum of 1,500 total cases is NARA's experience out of any size series, whether from ten thousand or ten million cases).
Disadvantages: There is obviously little chance that the few exceptional or outstanding cases in the series will be included in the random or statistical sample, although this can be compensated for by using a second method (see below) to complement the random sample. As well, researchers cannot do longitudinal work; one cannot trace a county or individual over time, as the county or person in every likelihood will not be selected for every annual or decennial random sample from the series. For files arranged alphabetically or in some other non-numerical scheme, the statistical sample is very difficult to pull, as it will require the counting and often may require the costly numbering of all the files before pulling. And for complex file series, there may be the need for a stratified (i.e., multiple) sample to ensure that various types of actions are sampled; this is very expensive and requires great statistical expertise. A high level of analysis is also required to determine the homogeneity of the series and the nature of the features or characteristics within the files which must each be given statistical weight. Archivists naturally should not be afraid of complex analysis nor of acquiring new expertise, but only cautious that the time thus spent to determine these factors does not pass the point of diminishing returns. As well, in that the total information universe is rarely known to the archivist for large series of continuing files perhaps scattered in hundreds of field offices, it is somewhat difficult and always expensive to apply statistical sampling techniques: one can, of course, sample each office separately, or add up the total number of files in each office in order to determine the whole before beginning the random sample. More difficult to determine (and defend) is when to sample: every year and on what date in it?), every tenth year, every twenty-fifth, etc. Finally, and most problematic, for continuing series organized without logical cutoff points, which is the case for the majority of operational programmes, the open-ended nature of the records system means that the information universe is unknowable. The first files in the series will be ready for destruction or preservation long before the last file is even begun. An unknowable information universe renders impossible statistically valid sampling; only for closed or contained series is it relevant. Of course, the archivist may impose cut-off dates (in order to "close" an "open" series), or do statistical samples at certain time intervals on all closed volumes of files accumulated to that point. But with such tactics, the size of the total sample remains uncontrollable, and for the whole series (when it eventually stops) there is no assurance that the sum of the some twenty or fifty samples taken on parts of the series over the years is equal (and statistically valid) to the one hypothetical (but impossible to perform) sample of the entire series.
32. Systematic Sampling. Selection based on a physical characteristic of the records or filing scheme without regard to the substantive information in the selected files.
Examples: All files from years ending in "2" or for surnames beginning with. "F"; every twentieth or nth file; every social insurance or identity number ending in "5"; all files measuring more than one inch thick or more than x volumes or sections in format (the so-called "fat file" method), which of course will vary from series to series as to what is "fat."
Advantages: The sample is relatively easy to pull, and does not require great expertise in the substantive content covered by the file. It can thus be pulled relatively inexpensively by clerks in records offices or records centres, rather than needing the direct (and costly) intervention of archivists or senior programme officials in departments with their knowledge of the substance of the files. As noted earlier (see 4.20 above), if using the l' fat file" method, the chances are good that the archivist may get most of the real problem cases.
Disadvantages: This method is not statistically valid; it cannot be used to reconstruct the whole. It is difficult to explain to researchers (i.e., to justify saving years ending in "2" rather than "7," and so on). It is impossible to control the size of the sample (especially with the "fat file" method) and thus space planning is very difficult. And, quite evidently, this method (with the partial exception of the "fat file" approach) does not guarantee that the outstanding or controversial cases have been preserved. Conversely, the fat file approach will likely result in preserving the cases which were the exception, not the rule.
33. Exemplary Sampling. Selection made on a qualitative basis to document some "typical" characteristic, activity, or time period.
Examples: All files from a particular region to show how a "typical" field office operated; or all files from the years immediately before and after an agency reorganization or significant legislative change to show their impact on actual operations; or all files for particular types of court proceedings (e.g., felony convictions); or all files for public servants reaching the rank of director or above. As another example, the archivist could also keep all series for many agencies for a very intensive geographical area (a small region or city) which is typical of the whole nation in order to take a snapshot of the societal image. The "fat file" may also be an exemplary sample, even though its physical characteristic places tit first under the systematic sampling category (see 4.32 above). If the file is "fat" because some consistent characteristic or feature of the programme renders it so (rather than just its physical size), then if that feature or characteristic is the one the archivist deems worthy of preservation for qualitative rather than purely quantitative reasons, the fat file method is also an exemplary sample.
Advantages: The method can be justified to researchers, although with some difficulty, and it can be used to trace a programme over time.
Disadvantages: The method is not statistically valid and cannot be used to reconstruct the whole. It does not save the exceptional cases and again there is no control over the size of the sample. It does require substantive expertise to make the right choices, as "typicality" of the isolated feature or characteristic or the time period will always be open to dispute, and therefore will require the archivist's careful analysis and explanation.
34. Exceptional Sampling. Selection of files on significant individuals, precedent-setting programmes, and landmark cases.
Examples: Pulling of exceptional individual cases from a series follows different criteria in each case. However, there are several types of individuals to watch for and these were stated generally in section 4.19 above, and a particular example was given for personnel files in section 2.27 of how to isolate the famous, controversial, and "firsts" from the ordinary and routine cases. Once again, depending on the reasons for unusually large files, the "fat file" method for some series may also indicate exceptional, precedent-setting cases.
Advantages: This selection can be justified, although with some difficulty, as it usually saves the controversial files that often demanded by researchers.
Disadvantages: The method is obviously not statistically valid, and may give a false impression of what the original whole series was like (i.e., distort the view of a "typical" case). It requires great substantive expertise as well as relatively good prior identification and arrangement of files so that the exceptional cases can be located and pulled. It is very closely linked to current research trends, and therefore highly susceptible to bias. Again, the size of the sample cannot be controlled.
35. Stratified sampling uses the same sampling method to acquire two or more samples from the same series in order to protect different characteristics of the whole: lower courts and appeal courts; field and regional offices; different income levels; or whatever other strata into which it seems useful to divide the files.
36. The archivist can also combine two or more of the above methods, where appropriate. If random or statistical sampling is one of the methods, it must of course be applied first so that the statistical validity of the whole is not impaired. It may be desirable to use statistical or systematic sampling first, and then search for an exemplary or exceptional sample second.
37. Most appraisal decisions for series of records containing personal information will be complicated and stratified depending on the agency, the programme, its citizens, and the nature of the records. Here a concrete example might be helpful for readers. To return to the Canadian immigration agency cited earlier in this report, the following appraisal decisions were made in late 1987 on a records schedule for the paper case files located in headquarters, regional offices, and field contact centres, as well as those in many appeal boards and tribunals.19 Three other types of records were not covered by the schedule submitted by the agency, but these records were also considered in the appraisal decision in order to make a more comprehensive and accurate evaluation: the policy and subject files, the microfilmed landing record form for each immigrant, and the electronic demographic data on immigrants. Although there were subtle distinctions between the various levels of the hierarchy, in general the following records were identified for archival retention:
-- all case files the surnames of whose subjects begin with the letter "F" (just under a 4 per cent sample of the series); as well as
-- all "unusual, controversial, historic, or precedent-setting" cases as defined by the agency;
-- all case files still surviving from before a significant cut-off date in the development of the automated systems and surviving electronic records in the agency (1969);
-- all "fat files" (the dimensions of which were specified);
-- all case files bearing a special "SF" prefix indicating a classification to the level of secret or marked "secret" (thus indicating special sensitivity in the eyes of the agency);
-- all case files whose subjects launched appeals to very high levels or tribunals in the system;
-- all case files for one ethnic group over an eight-decade period (which had survived intact in the agency);
-- all case files bearing special prefixes ("H" for Hindu, etc.) that designated a particular ethnic group, or person from it, segregated from the main case file series; and
-- all related indexes, registers, file classification manuals, and other similar finding aids.
The "F" letter was chosen because it was shown, after extensive study and computer analysis, to be one of the few letters inclusive of almost all ethnic groups' linguistic patterns for surnames. Because of the way the records are organized, numbered, and scattered, and because the series is open-ended and not closed, statistically valid random sampling was not considered possible. Recommended for destruction in this appraisal decision were over 95 per cent of the total files (i.e., those which were not in the "F" sample or the other much more limited categories), as well as virtually all files, which were very voluminous, relating to assisted passage and transportation loans or warrants (files prefixed "AP," "TL," or "W"). As well, it was recognized in preparing this appraisal that, as the National Archives of Canada gains better control over the electronic data bases of the immigration agency, several categories of records now scheduled for permanent retention in paper format may no longer be needed: the "F" sample and the single ethnic group, for example. As well, when overseas processes are better integrated electronically, the archival retention of related microfilmed records of overseas applications and cases from the records of the separate External Affairs agency of the Canadian government may also be discontinued. Appraisal must remain a dynamic process, ever changing as the circumstances of records creation and media transformation change.20
38. In conclusion, when appraising records containing personal information as defined in this study, the archivist must consider four factors in this order: researching and analyzing the "macro-appraisal" model of the societal "image"; utilizing the comprehensive approach to records assessment and scheduling; applying the general working rules and specific appraisal criteria; and tempering the decision, if at this point it is still positive, with whatever relevant practical, preservation, and political considerations may exist. In most cases, the latter will lead to some form of sampling as the best option.
1. An exception, as noted in Chapter 2, would be some of the essential categories of personal information records which archivists colectively should keep in every nation. As noted there, this is less an appraisal issue than one of preservation: which format and medium to keep, separating essential from supporting documents, etc.
2. Of course, there may be valuable data in case files where there is no significant variation or distortion of the programme and agency intentions, but that should be collected only if it meets the criteria of informational value. Such data do not contribute to the evidential or collectivist values of the image, for such conformity in the image is (and should be) assumed, unless documented otherwise through the model in Chapter 3, and it is also well described in many other sources (subject files, publications, procedure manuals, etc.) that collecting voluminous case files simply to prove such conformity is pointless.
3. The concept and term are Trudy Peterson's, in her letter to me of 19 March 1990.
4. As noted earlier, a discussion of records scheduling and records disposition is beyond the scope of this study. An extensive report analyzing these issues and proposing an active, planned approach to scheduling is available for those wishing to pursue the matter. See National Archives of Canada, Scheduling Task Group, "Strategic Planning Framework Study for the Disposition of Government Records," 30 January 1989 (final report of the Group, written by Cynthia J. Durance). An earlier but still useful study of the issue is National Archives and Records Service (now Administration), Task Force on Appraisal and Disposition of Federal Records, "Appraisal and Disposition Policies in NARS: A Report and Recommendations to the Archivist of the United States on Performance of the Appraisal and Disposition Functions in the National Archives and Records Service," 1983.
5. This is perhaps most starkly phrased in the policy statement of the New York State Archives: "If there is any archival 'principle' that delineates our appraisal activity, it is that any records are to be rejected unless there are definite and compelling justifications for their preservation." Cited in F. Gerald Ham, "Archival Choices: Managing the Historical Record in an Age of Abundance," in Nancy E. Peace, ea., Archival Choices (Lexington, 1984), p. 136. Ham appropriately entitles this section of his article "disciplined appraisal."
6. Hans Booms, "Society and the Formation of a Documentary Heritage: Issues in the Appraisal of Archival Sources," Archivaria 24 (Summer 1987), p. 95. Booms' comment is in the context of defending an active approach by archivists to choosing the best records, rather than passively reacting to records schedules submitted by departments with the main intention of storing in records centres or destroying the least valuable, most voluminous records. On this, see also Chapter 3.7.
7. There are, for example, 80,000 cubic feet of nineteenth-century military pension and related records alone in the custody of the National Archives and Records Administration (Letter, Trudy Peterson to author, 19 March 1990). Similarly, the NARA Department of Justice appraisal project selected for permanent preservation, after very careful analysis, 39,000 of the 135,000 feet of litigation case files dating from the early 1940's, and will preserve hereafter 2,400 feet of the 5,800 feet which accumulate annually. (See United States, National Archives and Records Administration, Office of Records Administration, Appraisal of Department of Justice Case Files: Final Report (Washington, 1989), pp. 1-3.) In the celebrated and highly intensive FBI appraisal analysis, over 5 million case files, 500,000 feet, and 100 million index cards were involved; of these, almost 1.2 million case files totally 50,000 feet will be retained by NARA. (See James Gregory Bradsher, "The FBI Records Appraisal," The Midwestern Archivist 13 (1988), pp. 53, 61.) These are enormous volumes of records for archives to retain.
8. For the special characteristics of electronic records and their significance for appraisal, see Harold Naugler, The Archival Appraisal of Machine-Readable Records: A RAMP Study With Guidelines (Paris, 1984), pp. 37-41.
9. See NARA, Appraisal of Department of Justice Case Files, passim. This report of under fifty pages is an excellent, concise example of appraising records containing personal information and its methodologies will interest readers of this present study. The Justice model followed that of the FBI case which was in turn patterned after the Massachusetts court records project led by Michael Hindus. (See Bradsher, "FBI Records Appraisal," Midwestern Archivist, pp. 55-56.
10. As noted in Chapter One, that is outside the scope of this study. In addition to other RAMP studies cited in the bibliography, the reader wishing a general overview of appraisal is referred to Maynard J. Brichford, Archives & Manuscripts: Appraisal and Accessioning (Chicago, 1977). Julia Marks Young's "Annotated Bibliography on Appraisal," American Archivist 48 (Spring 1985), pp. 190-216, lists 128 articles, books, and studies dealing with appraisal and is an excellent introduction to the field in general. A fine summary, which transcends its particular focus on university records and is complete with helpful charts and lists of appraisal factors, is Frank Boles and Julia Marks Young, "Exploring the Black Box: The Appraisal of University Administrative Records," in Ibid., pp. 121-40.
11. This point is made strongly in Larry Hackman and Joan Warnow-Blewett, "The Documentation Strategy Process: A Model and a Case Study," American Archivist 50 (Winter 1987), pp. 15-17.
12. National Archives and Records Service, Disposition of Federal Records (Washington, 1981), table 4, cited in Leonard Rapport, "In the Valley of Decision: What To Do about the Multitude of Files of Quasi Cases," American Archivist 48 (Spring 1985), p. 178, footnote 10. Rapport raises doubts in his piece about whether such criteria do not still bring too many useless records into archives.
13 The concept has been used in the FBI appraisal case, as well as in other investigations of sampling. For a precise analysis of the value of the "fat file" syndrome, see NARA, Appraisal of Department of Justice Litigation Case Files, pp. 47-49. Across all 194 categories of records appraised, and following a careful analysis of the files' contents and value, the following ranges of archival values were found:
Archival Value Rating (%)
Regular Sample Files
14. The case is described in more detail in Michael Cook, Archives Administration (London, 1977), pp. 73-74, and in Michael Roper's letter to me, 16 March 1990.
15. The impact of freedom of information and privacy legislation on archival work is analyzed with much sensitivity in Robert J. Hayward, "Federal Access and Privacy Legislation and the Public Archives of Canada," Archivaria 18 (Summer 1984), pp. 47-57. For one aspect directly affecting the quality of the image available in records containing personal information, see James Gregory Bradsher, "Privacy Act Expungements: A Reconsideration," Provenance 6 (Spring 1988), pp. 1-25. For an example of the increasing concern over the release of especially sensitive personal information to third parties, see Privacy Commissioner of Canada, AIDS and the Privacy Act (Ottawa, 1989), pp. 42-44. See also International Council on Archives, Access to Archives and Privacy: Proceedings of the Twenty-Third International Archival Round Table Conference Austin 1985 (Paris, 1987); and once again Michel Duchein, Obstacles to the Access. Use and Transfer of Information From Archives: A RAMP Study (Paris, 1983).
16. See Felix Hull, The Use of Sampling Techniques in the Retention of Records: A RAMP Study With Guidelines (Paris, 1981). The two best-known case studies are National Archives and Records Service, Appraisal of the Records of the Federal Bureau of Investigation: A Report to Hon. Harold T. Greene U.S. District Court for the District of Columbia (Washington, 1981); and Michael Stephen Hindus, Theodore M. Hammett, and Barbara M. Hobson, The Files of the Massachusetts Superior Court 1859-1959: An Analysis and a Plan for Action (Boston, 1979). See also the Department of Justice appraisal case described in footnote 7 above.
17. The next paragraphs follow closely Trudy Huskamp Peterson, "Summary of Sampling Techniques," in her Basic Archival Workshop Exercises (Chicago, 1982), pp. 12-13, although some points have been added and details and examples expanded.
18. I am indebted to my colleague, Tom Nesmith, for this tripartite breakdown, which is from his draft (March 1990) policy paper: "Sampling Textual Archival Records in the Government Archives Division, National Archives of Canada."
19. National Archives of Canada, Records Retention and Disposal Authority 88/012.
20. For an example, see Rapport, "In the Valley of Decision," American Archivist, pp. 173-89; and his earlier classic, "No Grandfather Clause: Reappraising Accessioned Records," in Maygene F. Daniels and Timothy Walch, eds., A Modern Archives Reader (Washington, 1984), pp. 80-90, which article was first published in 1981.