Cover Image
close this bookExpanding Access to Science and Technology (UNU, 1994, 462 pages)
close this folderSession 3: New technologies and media for information retrieval and transfer
close this folderInformation retrieval: Theory, experiment, and operational systems
View the document(introductory text...)
View the documentAbstract
View the document1. Scientific communication and information retrieval
View the document2. Anomalous states of knowledge
View the document3. Relevance
View the document4. Early experiments in IR
View the document5. Language
View the document6. Boolean logic, search strategy, and intermediaries
View the document7. Associative methods
View the document8. Probabilistic models
View the document9. Information-seeking behaviour
View the document10. Intelligence
View the documentReferences

8. Probabilistic models

Once it is assumed that the function of an IR system is to retrieve items that the user would judge relevant to his/her information need or ASK, then it becomes apparent that this is essentially a prediction process. These judgements of relevance have not yet happened. Or rather, if any items have been seen and judged for relevance, then those items are no longer of interest from the retrieval point of view because the user already knows about them. The system must in some fashion predict the likely outcome of the process in respect of any particular item should it present that item to the user. On the assumption that relevance is a binary property (the user would like to be informed of the existence of this item, or not), the prediction becomes a process of estimating the probability of relevance of each item and of ranking the items in order of this probability [9].

Translating this idea into a practical system depends on making assumptions about the kinds of information that the system may have on which to estimate the probability and how this information is structured. A very simple search-term weighting scheme, collection frequency weighting, seems to derive its power from being an approximation to a probabilistic function [5]. But more complex techniques may depend on the system learning from known judgements of relevance, either by the current user in respect of the current query or by other users in the past. The latter possibility has not yet, to my knowledge, been put into effect in any operational context, but the former is the basis for more than one operational system.

This is the idea of relevance feedback: after an initial search, the user is asked to provide relevance judgements on some or all of the items retrieved, and the system uses this information for a subsequent iteration of the search. Once again, the idea of relevance feedback is not exclusive to the probabilistic framework but fits very naturally within it. Indeed, the idea was first demonstrated in the context of the vector-space model.

Relevance feedback information can be used by the system partly to re-estimate the weights of the search terms originally used, but mainly to suggest to the system new terms that might usefully form part of the query. These new terms can again be weighted automatically, and might then be used automatically or presented to the user for evaluation. Thus, on iteration the search statement may not only be imprecise, it may also be actually invisible to the user. The system can locate items that the user might want to see on the basis of criteria of which the user is not aware.

Although relevance feedback seems at first glance to be not too far removed from the input-output model (being an explicit form of feedback within the same framework), and also seems to embody a relatively mechanical notion of relevance, its implications are actually revolutionary. We begin to perceive the user not as feeding in a question and getting out an answer, but as exploring a country that is only partially known and where any clue as to location in relation to where the user wants to be should be seized upon. This concept of retrieval is explored further in the next section.

An example of a system that incorporates relevance feedback is OKAPI [17]. Although an experimental system, it functions in an operational environment, with a real database of realistic size and real users, in order to allow a variety of evaluation methods to be applied. Some results of a recent experiment using OKAPI will help to inform the next section.