Fast Inverted Indexes with On-Line Update
Charles L. A. Clarke? Gordon V. Cormack Forbes J. Burkowski
Dept. of Computer Science
University of Waterloo, Waterloo, Canada, N2L 3G1
Technical Report CS-94-40
November 23, 1994
We describe data structures and an update strategy for the practical implementation of inverted indexes. The context of our discussion is the construction of a dedicated index engine for a distributed full-text information retrieval system, but the results have wider application. Retrieval operations require a single disk access per query term. The on-line update strategy guarantees the consistency of on-disk data structures. Index compression integrates smoothly.
Our general concern is the construction of a distributed full-text information retrieval system. The basic architecture consists of a group of LAN- connected processors, each managing its own separate disk and memory. Individual processors act as either text servers, storing documents and servicing requests for portions of these documents, or as index engines, identifying the portions of documents that match client-generated search criteria. To external clients, the group of machines appears to be a single large information retrieval system. A front-end processor, the Marshaller/Dispatcher, coordinates the activities of the group of processors, interacting with client applications, dispatching queries to the index engines and text servers, marshalling query results and returning the results to clients. Figure 1 provides a schematic overview of the architecture.
Figure 1: Architecture of the retrieval system.