page 1  (5 pages)
2to next section

Integrating Information Sources Using Context Logic

Adam Farquhar

Angela Dappert

Richard Fikes

Wanda Pratt

Knowledge System Laboratory
Stanford University
{adam_farquhar,dappert,fikes,pratt}@ksl.stanford.edu

Abstract
It is essential to reduce the cost of integrating information sources and to provide a path that allows for incremental integration that can be responsive to users' demands. This paper presents an approach to integrating disparate heterogeneous information sources that uses context logic. Our use of context logic reduces the up-front cost of integration, provides an incremental integration path, and allows semantic conflicts within a single information sources or between information sources to be expressed and resolved.

Introduction

The number of online network-accessible information sources grows daily. The information promises to provide tremendous value for individuals and corporations. The promise will remain unfulfilled, however, until it is possible to integrate and assimilate information from multiple heterogeneous sources. Because it is impossible to predict the users and patterns of usage in our changing information environment, information providers are not willing to pay a high up-front cost to support integration. Thus, it is essential to reduce the cost of integrating information sources and to provide a path that allows for incremental integration that can be responsive to users' demands. This paper presents an approach to integrating disparate heterogeneous information sources. Our approach uses context logic to ease integration and provide for incremental integration.

Existing approaches do not provide either adequate integration (loosely coupled databases [12]), or adequate flexibility (federated databases [16]), or acceptable costs (global schemas [8,9,17]). Recent work has begun to draw on the insights of the artificial intelligence and knowledge representation communities. The Carnot project [3,7] has taken a global schema approach, but with the hope that the extensive CYC knowledge base [10] will provide a comprehensive common representation. The SIMS

project [1] has focused more on the issues of query optimization and planning than on the resolution of semantic conflicts. We share a number of goals with the metadata approach [15] , but our technical approach differs substantially. McCarthy and Buvac [14] discuss the application of context logic to integrating databases.

Although the techniques we describe are applicable to a wide variety of information sources, this paper focuses on structured information sources such as relational databases. The difficulty with integrating relational databases is that their views of the world differ. Their ontologies vary, as does the intended meaning of their data (even when it appears quite similar). Their schema may differ in naming conventions, structure (an attribute in one system may be represented as a value in another) and most importantly, their semantics.

Integrating heterogeneous systems requires making implicit assumptions explicit enough to avoid semantic errors. Consider the problems that might arise when trying to determine the number of items in inventory by querying several databases. One database may store the number of individual items, a second may store the number of pallets, and a third may record the difference between outstanding orders and items on hand. The result of summing these numbers will be completely meaningless. It might even result in a negative number.

Our goal is to enable meaningful integration of the information across multiple information sources by resolving semantic inconsistencies in an unobtrusive and cost effective fashion. We want to provide users with access to the complete power of the individual information sources, rather than access to a leastcommon-denominator global schema. We also want users to be able to take advantage of their familiarity with an individual information source by allowing them to pose queries using that source's vocabulary, but which collect data from others.

This paper focuses on the semantics of the data represented in the information sources. It is worth noting, however, that our framework also makes it possible to represent information about the