page 2  (21 pages)
to previous section1
3to next section

became electronic mail in support of collaboration. This trend continues into the present incarnation of the Internet, but with increasingly diverse support for collaborative data sharing activities. Electronic mail has been supplemented by a variety of wide-area filing, information retrieval, publishing and library access systems. At present, the Internet provides access to hundreds of gigabytes each of software, documents, sounds, images, and other file system data; library catalog and user directory data; weather, geography, telemetry, and other physical science data; and many other types of information.

To make effective use of this wealth of information, users need ways to locate information of interest. In the past few years, a number of such resource discovery tools have been created, and have gained wide popular acceptance in the Internet [4, 17, 18, 27, 31, 35, 44].1 Our goal in the current paper is to examine the impact of scale on resource discovery tools, and place these problems into a coherent framework. We focus on three scalability dimensions: the burgeoning diversity of information systems, the growing user base, and the increasing volume of data available to users.

Table 1 summarizes these dimensions, suggests a set of corresponding conceptual layers, and indicates problems being explored by the authors, who comprise the Internet Research Task Force (IRTF) Research Group on Resource Discovery and Directory Service. Users perceive the available information at the information interface layer. This layer must support scalable means of organizing, browsing, and searching. The information dispersion layer is responsible for replicating, distributing, and caching information. This layer must support access to information by a large, widely distributed user populace. The information gathering layer is responsible for collecting and correlating the information from many incomplete, inconsistent, and heterogeneous repositories.

The remainder of this paper covers these layers from the bottom up. Section 2 discusses problems of information system diversity. Section 3 discusses the problems brought about by growth in the user base. Section 4 discusses problems caused by increasing information volume. Finally, in Section 5 we offer a summary.

2 Information System Diversity

An important goal for resource discovery systems is providing a consistent, organized view of information. Since information about a resource exists in many repositories|within the object

1The reader interested in an overview of resource discovery systems and their approaches is referred to [43].

Scalability Dimension Conceptual Layer Problems Research Focus

Data Volume Information Interface Information Overload Topic Specialization; Scalable ContentIndexing;
User Base Information Dispersion Insufficient Replication;
Manual Distribution
Topology

Massive Replication;
Access Measurements;
Object Caching
Data Diversity Information Gathering Data Extraction;
Low Data Quality
Operation Mapping;
Data Mapping

Table 1: Dimensions of Scalability and Associated Research Problems