page 1  (11 pages)
2to next section

Fast Inverted Indexes with On-Line Update

Charles L. A. Clarke? Gordon V. Cormack Forbes J. Burkowski

Dept. of Computer Science

University of Waterloo, Waterloo, Canada, N2L 3G1

Technical Report CS-94-40
November 23, 1994

Abstract

We describe data structures and an update strategy for the practical implementation of inverted indexes. The context of our discussion is the construction of a dedicated index engine for a distributed full-text information retrieval system, but the results have wider application. Retrieval operations require a single disk access per query term. The on-line update strategy guarantees the consistency of on-disk data structures. Index compression integrates smoothly.

1 Introduction

1.1 Environment

Our general concern is the construction of a distributed full-text information retrieval system. The basic architecture consists of a group of LAN- connected processors, each managing its own separate disk and memory. Individual processors act as either text servers, storing documents and servicing requests for portions of these documents, or as index engines, identifying the portions of documents that match client-generated search criteria. To external clients, the group of machines appears to be a single large information retrieval system. A front-end processor, the Marshaller/Dispatcher, coordinates the activities of the group of processors, interacting with client applications, dispatching queries to the index engines and text servers, marshalling query results and returning the results to clients. Figure 1 provides a schematic overview of the architecture.

?Email: claclark@plg.uwaterloo.ca

Marshaller/
Dispatcher TextServer

TextServer

Client Applications

Engine

Engine

EngineIndex Index

Index

Figure 1: Architecture of the retrieval system.