|Expanding Access to Science and Technology (UNU, 1994, 462 pages)|
|Session 3: New technologies and media for information retrieval and transfer|
|Information retrieval: Theory, experiment, and operational systems|
Stephen E. Robertson
The paper examines the process of scientific communication resulting from users' expressed needs for information and in particular the formal mechanisms for the storage and retrieval of information in response to queries or requests. Formal indexing or coding schemes, Boolean systems, facet analysis, and associative methods, as well as probabilistic models, are reviewed and information-seeking behaviour is discussed.
It is a commonplace that science depends on communication. Science is a social activity; scientists' ideas, models, and results have to be scrutinized by their peers, analysed and tested with the possibility of validation or refutation as well as the construction of further science.
In order to investigate the role of information retrieval and the effect of developing technologies on scientific communication, we may start from a simplified view of the process of scientific communication (see chart p. 145).
Such a diagram is deceptive both in its simplicity and in its circularity. As a publishing scientist, I am clearly not communicating with myself! My potential audience will be not only other scientists (who may indeed feed new publications into the process), but also other users of scientific information, e.g. those who apply the knowledge. The diagram also suggests a system with just one communication channel, and furthermore seems to imply a system that always works! Neither is in general the case.
We in the information world tend to work with systems (fragments or subsystems of this larger process) that are supposed to contribute to the whole by providing certain linking mechanisms. By and large we work with relatively formal mechanisms (publication, libraries, databases, etc.); we like to think that they are vital to the whole. However, we also know that scientists rely extensively on less formal mechanisms (personal contacts, meetings, etc.). Furthermore, from the scientist's-eye view, there are many sources/channels of information that may be selected or rejected at different times, for all sorts of reasons. One of our concerns in developing a science of information must be the scientist's perception of the information environment, and the selection and use made of sources, channels, and modes of both obtaining information and communicating ideas.
The concern in the present paper is mainly with formal mechanisms for the storage and retrieval of information, in response to queries or requests. But the wider aspects of the communication process will be kept in view. I start with a perspective on the situation that gives rise to the request; at the end, I will return to some features of human information-seeking behaviour.
We must first of all ask the question, Why does a user (scientist) approach an information retrieval system? The simple answer must be because of a need or wish or imagined need for information. However, the user's perception of this information need deserves some exploration.
In Taylor's classic paper, "Question-Negotiation and Information Seeking in Libraries" , four stages are identified:
(1) the visceral need (i.e. the user's gut feeling of a need for information);
(2) the verbalized need (the user's first attempt to put the information requirement into words);
(3) the formalized need (the user's expression of the requirement in terms acceptable to the system);
(4) the compromised need (the user's revised expression of the requirement after negotiation with the system).
The last two stages relate to the user's interaction with the system, which is discussed later.
Belkin has further analysed the origins of the visceral need. A user has a state of knowledge of the world, an internal knowledge structure of great complexity. The perception of an information need arises from a perceived problem with some part of this knowledge structure (which may not be a simple gap but some internal inconsistency, conflict with evidence, or whatever). Belkin has called this the "anomalous state of knowledge," or ASK 
The ASK hypothesis potentially has strong consequences for the design of information retrieval systems. Most systems in effect demand that the users specify the piece of information that they require, and aim to provide the items that fit the specification. The ASK hypothesis suggests instead a problem-solving approach, where the system cooperates with the user in an attempt to solve the perceived problem (or resolve the anomaly).
Although some of the ideas discussed below predate the ASK hypothesis and involve a rather more traditional approach to IR system design, the ASK idea will inform my discussion throughout. Something like the problem-solving approach will recur in later sections.
One of the most important concepts for an understanding of our present ideas of information retrieval is that of relevance. Relevance is an extremely difficult idea to pin down and has been the subject of much work over a large number of years . It originates from well before the ASK hypothesis, and one way to think of it might be in terms of correctness. An item might be regarded as relevant to a request if it is in some sense a "correct" response to that request. There are indeed IR theorists who take a modern version of that view: an item is relevant to a request if the request can be inferred from the item, rather in the fashion of theorem proving, but with an appropriate logic .
However, probably the dominant view of relevance (and one that is rather more in sympathy with the ASK hypothesis) is a much more subjective one. An extreme version of the subjective approach would be to say simply that an item is relevant to a user's information need or ASK if the user says it is, or in other words if the user would like the system to retrieve this item. More commonly, in our experiments we rely where we can on end-users making relevance judgements in relation to their perceived needs, according to some descriptive scale that we devise; we also do some laboratory experiments with expert judges judging relevance to stated requests.
Even without agreement as to what exactly relevance is, the idea of relevance is of central importance to theory and experiment in IR, and it is becoming important to IR practice as well. From the experimental point of view, where the concept originated, we need it in order to evaluate different approaches and methods in IR. From there it has fed into theory; many recent theories in IR depend upon it. Finally, some methods of IR ask the user to make relevance judgements on-line and use that information internally as one kind of clue to help formulate a new search.
The major assumption is that users can make relevance judgements or recognize relevant items, even if not necessarily with absolute confidence. It is hard to imagine an information-seeking activity where the user was in principle unable to assess whether an item is appropriate or not, though of course users may suspend judgement or change their minds in particular cases, perhaps until or after they have more information about the item or have read some other item.
The idea of the experimental evaluation of IR systems is central to both theory and operational system development. Perhaps surprisingly, this idea is only about 35 years old. (Admittedly, 35 years is a long time in the history of computer-based IR systems; but some kinds of IR system, such as library classification schemes, predate computers by at least two-and-a-half millennia!)
We are not so much concerned here with whether the system works in a technical or physical sense as with something that might now be described as its cognitive functioning. In other words, the question as to whether the system will succeed in locating items with specific characteristics (words, codes) is not generally at issue. The question is, Do those items with the specific characteristics actually serve the information need or resolve the ASK? This will depend, in general, on the ways the system offers of specifying characteristics. In the earliest experiments, the question took the form: Does the system retrieve the "correct" answer in response to a query? Setting aside the problem of ASKs and relevance, the implied model of the IR system was what might be described as an input-output model - feed in the query, get out the answer. It was a model that fit well with the early computer-based systems. (Actually, they would very likely be human-assisted; the searcher would send the query off to a library or information centre, where an expert would formulate it in system terms, run the search, and return the results.) In retrospect, however, it seems like a temporary aberration. Both older systems (card catalogues, printed indexes, etc.) and newer ones (highly interactive on-line systems) exhibit characteristics that do not fit too well with the input-output model, particularly if the searcher is the end-user, that is the person needing the information.
The early experiments told us a little about the design of IR systems, but they also focused attention in certain areas, and it may be argued that their lasting influence lies in this focusing process. One area is the one already mentioned, that of relevance: the necessity to devise an operational definition of a "correct" answer was a major stimulus to the reconsideration of the notion of relevance. A second kind of focus was on the particular aspects of IR system design that seemed important. A number of such aspects that had been endlessly debated in the 1940s and 1950s now seemed to be of relatively minor importance; by contrast, some aspects that had received little consideration now became central. One of the later outcomes of this process, as we shall see, has been the concern with highly interactive systems.
Received wisdom in the 1950s was that IR required some kind of formal indexing or coding scheme. (Exactly what kind of scheme was one of the topics of endless debate.) Thus, items required indexing/coding in terms of the scheme, which probably in turn required a human indexer (though a machine might be taught to do it). A similar process was required at the search stage, in respect of the query or need, though an end-user might possibly learn enough about the formal scheme to conduct a satisfactory search.
The results of the early experiments, together with the developing technology and changing perceptions of how it might be used, caused a backlash against this received wisdom. It became feasible to throw into the computer larger and larger quantities of text, and retrieve on the basis of words in text rather than assigned keywords or codes. At first sight, the necessity for any kind of indexing scheme, at either end of the process, seems to disappear: the user can use "natural" language to search a "natural" language database, without any interference from librarians.
We have since come to a much more balanced view of language, though the debate continues to generate new questions as we develop our highly interactive systems. It is clear that natural language searching is a powerful device that can often produce good results economically. However, it places a large burden on the searcher, and besides, certain kinds of queries are not well served. In recognition of these points, many modern databases include both formal indexing and searchable natural language text.
Formal artificial languages (in which category I include library classification schemes) represent particular views of the structure and organization of knowledge. One idea that emerged from the analysis of such languages, and that is central to modern indexing languages as well as to the practice of searching, is that of the facet. Once it is recognized that topics and problem areas are potentially highly complex, it becomes essential to approach the problem of describing them via different aspects, or facets, and combining the resulting descriptions in a building-block fashion . (The idea of a faceted classification scheme, while originally due to Ranganathan in the 1930s, was put in its most concise form by Vickery; B.C. Vickery, Faceted Classification, London: Aslib, 1960.)
Many modern indexing languages, while not necessarily following the rules of faceted classification, reflect an essentially facet-based approach to the organization of knowledge. But the approach also has value at the searching stage, whether or not the database being searched is indexed by such a language. This theme is taken up again below.
Long before either natural language searching or faceted classification, and certainly long before the modern computer, it became apparent that certain kinds of information retrieval would benefit from the ability to search on combinations of characteristics, in a way that we would now normally represent by the Boolean operator AND. Indeed, much effort went into devising mechanisms to allow such searching. Prime examples were Hollerith's mechanically sorted punched cards of the 1890s and a scheme of optical coincidence cards first invented in 1915 .
Most currently available computer-based systems allow searching using Boolean logical search statements, together with a few extensions to Boolean logic appropriate to searching text; for example, an operator to indicate that two words should not only occur in the same record, but also that they should be next to each other. In this respect, they appear to differ little from the punched-card systems of the 1930s. However, we may point to two major differences. Firstly, as discussed above, we have the possibility of searching natural language text. Secondly, the systems are designed specifically to allow and encourage certain kinds of feedback during the search. In other words, it is not expected that a searcher will be able to specify, precisely and a priori, the characteristics of the desired item(s). Rather, the search is expected to proceed in an iterative fashion, with the results of one (partial) search statement serving to inform the next search.
Feedback in Boolean systems is of a very limited kind (this point will be taken up again below). Furthermore, the use of Boolean logic seems to suggest an analogy with traditional database management systems, where feedback is not normally an issue. However, the use of even a crude form of feedback is a recognition of the cognitive problems discussed earlier and a departure from the simple input-output model of information retrieval.
The problem of formulating a search and developing a search strategy for a Boolean system has received a great deal of attention . This work has been informed by theoretical developments (the ASK hypothesis and facet analysis) and by the results of experiments. But one would not describe such work as theoretical or experimental so much as a codification of good practice. Searching such systems is best described as a skilled task. For this reason, it has not been the norm for scientists themselves to conduct their own searches. This does not mean, of course, that it does not happen. The companies offering search services on large text databases have been predicting for many years the dominance of end-user searching and the demise of the search intermediary. However, the intermediary has signally failed to disappear! (An intermediary is normally a specialist in the art of searching large text databases, perhaps but not necessarily with some subject knowledge. )
The impact of experimental work on the study of search strategy has not been very great. However, there was one experimental result from the 1960s that encapsulated, in a surprising way, the problems subsequently addressed by the ASK hypothesis and that was seminal in our understanding of the search strategy problem. It is therefore worth describing this result.
The experiment involved the Medlars Demand Search Service and the National Library of Medicine (NLM) in the United States . At that time, in order to conduct a search on a medical topic, the user would have to communicate with the NLM, either directly or via a local expert. Local experts existed at various places around the world to help users make the best use of the service.
The requests that were collected for the experiment could be divided into those where the user talked face-to-face with an expert and those where the user wrote a letter requesting a search. The expectation was that the face-to-face communication would be beneficial to the search by enabling the development of a better search formulation. However, in the event, the reverse was the case: on average, the letter-based requests performed slightly better than the face-to-face ones.
The experimenters' explanation of this result, after studying the data, was to suggest that users often came to face-to-face meetings without clearly verbalized requests, and that intermediaries tended to suggest formulations that were easy in system terms. The letter writers, on the other hand, were forced to articulate their needs more systematically before encountering the constraints of the system.
This result made a clear link between the Taylor/Belkin view of information retrieval and the practical problems of searching and had a major impact on the training of search intermediaries.
The relation between Boolean searching and facet analysis is a simple one: an analysis of the search topic from a facet viewpoint fits naturally with a canonical form of search statement, as follows:
(A1 or A2 or . . . ) and (B1 or B2 or . . . ) and . . .
Here the separate facets (or component concepts of the topic) are A, B. . . ., and each search term A1 is one of the ways of representing concept A. A number of "intelligent" front-end systems for helping users to construct searches assume this canonical form. It is, however, an extremely limited form if interpreted literally: for example, it fails to allow that one of the Ais might itself be a phrase or a combination of concepts.
Experimenters and theorists in IR have been working for many years with alternatives to Boolean search statements. In particular, they have been using what might be described as "associative" methods, where the retrieved documents may not match exactly the search statement but may be allowed to match approximately. For example, the search statement may consist of a list of desirable characteristics, but the system may present as possibly useful items that lack some of the characteristics. Then the output of the system may be a ranked list, where the items at the top of the list are those that match best in some sense, but the list would include items that match less well.
Until relatively recently, this work had had little impact on operational systems. Most such systems continued to use Boolean (or extended Boolean) search logic. However, some recent systems have adopted some associative retrieval ideas. For reasons that will become apparent, I believe that associative retrieval offers far better possibilities for systems that genuinely help end-users to resolve information problems or ASKs. Therefore I welcome this development and indeed see it as long overdue.
There is a wide range of possible approaches to the problem of providing associative retrieval. This section will give a very brief overview of some of the approaches, and the following section will look at one particular model in somewhat more detail.
The associative approach with the most substantial history of development is the vector space model of Salton and others . In this model, the documents and queries are seen as points in a vector space: retrieval involves finding the nearest document points to the query point. This model leads naturally to various associative-retrieval ideas, such as ranking, document clustering, relevance feedback, etc. It has been the basis of a number of experimental systems from the early 1960s on, and many different ideas have been incorporated at different times and subjected to experimental test. Mostly these tests have been along the lines described in section 4 above, with the system treated in input-output fashion.
A second approach is suggested by Zadeh's Fuzzy Set theory. There have been a few attempts to apply fuzzy set theory to information retrieval, though it has not received nearly as much attention as the vector-space model . (This reference describes the original fuzzy set theory. Much related work has occurred since in, for example, fuzzy logic or fuzzy decision-making. An example of an application in IR is: W.M. Sachs, "An Approach to Associative Retrieval through the Theory of Fuzzy Sets," Journal of the American Society for Information Science, 27: 8587.) The main attraction for the IR application is that it seems to present the possibility of combining associative ideas with Boolean logic, although there are actually some serious theoretical problems in that combination . There is a conspicuous lack of any attempt to evaluate fuzzy set theory-based systems.
A third approach is that based on statistical (probabilistic) models. Although statistical ideas have been around in IR for a very long time, most such work nowadays is based on a specific probabilistic approach, which attempts to assess the probability that a given item will be found relevant by the user. In this sense it belongs firmly with the evaluation tradition discussed in section 4 and with the ideas of relevance that emerged from that tradition, although it turns out to fit very naturally with more recent ideas of highly interactive systems. The probabilistic approach is discussed in more detail in the next section.
It is not strictly necessary to regard these three approaches as incompatible. It is possible to devise methods that make use of ideas based on more than one approach. However, they do suggest very different conceptions of the notion of degree of match between documents and queries.
Once it is assumed that the function of an IR system is to retrieve items that the user would judge relevant to his/her information need or ASK, then it becomes apparent that this is essentially a prediction process. These judgements of relevance have not yet happened. Or rather, if any items have been seen and judged for relevance, then those items are no longer of interest from the retrieval point of view because the user already knows about them. The system must in some fashion predict the likely outcome of the process in respect of any particular item should it present that item to the user. On the assumption that relevance is a binary property (the user would like to be informed of the existence of this item, or not), the prediction becomes a process of estimating the probability of relevance of each item and of ranking the items in order of this probability .
Translating this idea into a practical system depends on making assumptions about the kinds of information that the system may have on which to estimate the probability and how this information is structured. A very simple search-term weighting scheme, collection frequency weighting, seems to derive its power from being an approximation to a probabilistic function . But more complex techniques may depend on the system learning from known judgements of relevance, either by the current user in respect of the current query or by other users in the past. The latter possibility has not yet, to my knowledge, been put into effect in any operational context, but the former is the basis for more than one operational system.
This is the idea of relevance feedback: after an initial search, the user is asked to provide relevance judgements on some or all of the items retrieved, and the system uses this information for a subsequent iteration of the search. Once again, the idea of relevance feedback is not exclusive to the probabilistic framework but fits very naturally within it. Indeed, the idea was first demonstrated in the context of the vector-space model.
Relevance feedback information can be used by the system partly to re-estimate the weights of the search terms originally used, but mainly to suggest to the system new terms that might usefully form part of the query. These new terms can again be weighted automatically, and might then be used automatically or presented to the user for evaluation. Thus, on iteration the search statement may not only be imprecise, it may also be actually invisible to the user. The system can locate items that the user might want to see on the basis of criteria of which the user is not aware.
Although relevance feedback seems at first glance to be not too far removed from the input-output model (being an explicit form of feedback within the same framework), and also seems to embody a relatively mechanical notion of relevance, its implications are actually revolutionary. We begin to perceive the user not as feeding in a question and getting out an answer, but as exploring a country that is only partially known and where any clue as to location in relation to where the user wants to be should be seized upon. This concept of retrieval is explored further in the next section.
An example of a system that incorporates relevance feedback is OKAPI . Although an experimental system, it functions in an operational environment, with a real database of realistic size and real users, in order to allow a variety of evaluation methods to be applied. Some results of a recent experiment using OKAPI will help to inform the next section.
A recent experiment investigated various aspects of searching or Information-seeking behaviour, including the behaviour of repeated users of the system . The system was accessible over the network in an academic environment and was available to many users through terminals on their desks or very close by. There was no direct cost to using the system, and since it is very easy to use (being designed so that someone walking in off the street could be expected to be able to use it), there was no barrier of any kind to its repeated and frequent use. Individual users were logged.
What was found was that a number of users made repeated use of the system, quite often (surprisingly) starting with a query that was very similar to or even absolutely identical to their previous query. It is clear that they were not simply asking the same question again, but rather using the entry point that they already knew about as a way into this somewhat unfamiliar country, a familiar starting point for a new exploration. Relevance feedback (which is just one of the mechanisms of which they might make use) is not a matter of saying "this is correct," but rather of saying, "supposing we try this direction, where will it take us?"
Thus it seems that starting with a theoretical approach based on a traditional, input-output model of IR has led us to methods and techniques that fit very well with the ASK hypothesis and a problem-solving or exploratory view of IR. We have arrived at the right answer, but for the wrong reasons!
There are, of course, researchers in the field who are entitled to say "I told you so!" Examples include Oddy's THOMAS system and Swanson's view of retrieval as a trial-and-error process . However, we do now have evidence that we are capable of providing information retrieval systems that can have a genuine impact on information-seeking behaviour in a broad sense. One task that faces us is to develop our methods and ideas of evaluation to take into account this broader view. We, researchers in information retrieval, need to know much more about how users (including scientists and technologists) approach their information-seeking or problem-solving tasks, preferably over a period of time rather than simply in response to a suddenly perceived information need .
Indeed, I have found it instructive now to revisit some work that was (when it was undertaken) right outside the field of information retrieval: T.J. Allen's work on communication in science and engineering . What is critical here is the user's perception of his or her information environment and the sources and channels of communication that are open. One of Allen's conclusions concerned the relative importance of informal as against formal channels. The more we can design systems that appear to the user to be less formal, perhaps the better we shall be able to serve him or her. An information retrieval system should be as accessible and as easy to communicate with as a colleague in the next office; only then will the real breakthrough occur.
It may be noted that I have not yet mentioned any of the work in the artificial intelligence (AI), expert system or knowledge-based system (KBS) areas. There have indeed been many attempts to apply such ideas to information retrieval, though there is in my view less evidence for their effect or effectiveness in the context of operational systems.
The possible role(s) for knowledge bases in IR is the subject of much debate. One approach is to treat the expert intermediary as the source of knowledge, in other words to try to encapsulate the intermediary's skill in a system . However, a major component of the intermediary's expertise, at least as represented in such systems, seems to be the manipulation of Boolean search statements. If we can get by without such statements, then much of the point of these systems seems to be lost.
The other kind of knowledge that, in principle, should be of use would be that embodied in a thesaurus, classification scheme, or other formalized indexing language. But such knowledge does not seem to fit very easily with established KBS ideas.
My own opinion, for what it's worth, is that the way forward may be to incorporate selective and small-scale "intelligent" (or moderately clever) methods into the associative retrieval framework, without attempting to go all the way to an intelligent system. Cleverness need not take the form expected in the current KBS tradition: a relevance feedback system based on the probabilistic model already seems quite clever to the user. Perhaps the central point is that we are attempting to provide tools to help the user solve his or her own problems; we are not attempting to solve their problems for them. Relatively simple tools may be best suited to that purpose.
1. Allen, T.J. (1968). "Organizational Aspects of Information Flow in Technology." Aslib Proceedings 20: 433-454.
2. Bates, M. (1987). "How to Use Information Search Tactics Online." Online 11: 47-54.
3. .Belkin, N.J. (1980). "Anomalous States of Knowledge as the Basis for Information Retrieval." Canadian Journal of Information Science 5: 133-143.
4. Buell, D.A. (1985) "A Problem in Information Retrieval with Fuzzy Sets." Journal of the American Society for Information Science 36: 398-4()1.
5. Croft, W.B., and D.J. Harper (1979). "Using Probabilistic Models of Document Retrieval without Relevance Information." Journal of Documentation 35: 285295.
6. Lancaster, F.W. (1968). "Evaluation of the Medlars Demand Search Service." Bethesda, Md.: National Library of Medicine.
7. Oddy, R.N. (1977). "Information Retrieval through Man-Machine Dialogue." Journal of Documentation 33: 1-14; D. Swanson, "Information Retrieval as a Trial-and-Error Process." Library Quarterly 47: 128-148.
8. Ranganathan, S.R. (1937). Prolegomena to Library Classification. Madras: Library Association. 2nd ed. London: Library Association, 1975.
9. Robertson, S.E. (1977). "The Probability Ranking Principle in JR." Journal of Documentation 33: 294-304.
10. Robertson, S.E., and M.M. Hancock-Beaulieu (1992). "On the Evaluation of IR Systems." Information Processing and Management. Forthcoming.
11. Salton, G. (1971). The SMART Retrieval System: Experiments in Automatic Document Processing. Englewood Cliffs, N.J.: Prentice Hall.
12. Schamber, L., M.B. Eisenberg, and M.S. Nilan (1990). "A Re-examination of Relevance: Toward a Dynamic, Situational Relevance." Information Processing and Management 26: 755-776.
13. Taylor, H. (1915). "Selective Devices." U.S. Patent no. 1165465.
14. Taylor, R.S. (1968). "Question-Negotiation and Information-seeking in Libraries." College and Research Libraries 29: 178-194.
15. Rijsbergen, C.J. (1989). "Towards an Information Logic." Proceedings of the Twelfth ACM SIGIR Conference on Research and Development in Information Retrieval, 7786.
16. Vickery, A., H.M. Brooks, B. Robinson, and B.C. Vickery (1987). "A Reference and Referral System Using Expert System Techniques." Journal of Documentation 43: 1-23.
17. Walker, S., and R. DeVere (1990). Improving Subject Retrieval in Online Catalogues, 2: Relevance Feedback and Query Expansion. London: British Library.
18. Walker, S., and M. Hancock-Beaulieu (1991). OKAPI at City: An Evaluation Facility for Interactive IR. BL Report no. 6056. London: British Library.
19. Zadeh, L.A. (1965). "Fuzzy Sets." Information and Control 8: 338-353.