Cover Image
close this bookExpanding Access to Science and Technology (UNU, 1994, 462 pages)
close this folderSession 3: New technologies and media for information retrieval and transfer
close this folderInformation retrieval: Theory, experiment, and operational systems
View the document(introductory text...)
View the documentAbstract
View the document1. Scientific communication and information retrieval
View the document2. Anomalous states of knowledge
View the document3. Relevance
View the document4. Early experiments in IR
View the document5. Language
View the document6. Boolean logic, search strategy, and intermediaries
View the document7. Associative methods
View the document8. Probabilistic models
View the document9. Information-seeking behaviour
View the document10. Intelligence
View the documentReferences

6. Boolean logic, search strategy, and intermediaries

Long before either natural language searching or faceted classification, and certainly long before the modern computer, it became apparent that certain kinds of information retrieval would benefit from the ability to search on combinations of characteristics, in a way that we would now normally represent by the Boolean operator AND. Indeed, much effort went into devising mechanisms to allow such searching. Prime examples were Hollerith's mechanically sorted punched cards of the 1890s and a scheme of optical coincidence cards first invented in 1915 [13].

Most currently available computer-based systems allow searching using Boolean logical search statements, together with a few extensions to Boolean logic appropriate to searching text; for example, an operator to indicate that two words should not only occur in the same record, but also that they should be next to each other. In this respect, they appear to differ little from the punched-card systems of the 1930s. However, we may point to two major differences. Firstly, as discussed above, we have the possibility of searching natural language text. Secondly, the systems are designed specifically to allow and encourage certain kinds of feedback during the search. In other words, it is not expected that a searcher will be able to specify, precisely and a priori, the characteristics of the desired item(s). Rather, the search is expected to proceed in an iterative fashion, with the results of one (partial) search statement serving to inform the next search.

Feedback in Boolean systems is of a very limited kind (this point will be taken up again below). Furthermore, the use of Boolean logic seems to suggest an analogy with traditional database management systems, where feedback is not normally an issue. However, the use of even a crude form of feedback is a recognition of the cognitive problems discussed earlier and a departure from the simple input-output model of information retrieval.

The problem of formulating a search and developing a search strategy for a Boolean system has received a great deal of attention [2]. This work has been informed by theoretical developments (the ASK hypothesis and facet analysis) and by the results of experiments. But one would not describe such work as theoretical or experimental so much as a codification of good practice. Searching such systems is best described as a skilled task. For this reason, it has not been the norm for scientists themselves to conduct their own searches. This does not mean, of course, that it does not happen. The companies offering search services on large text databases have been predicting for many years the dominance of end-user searching and the demise of the search intermediary. However, the intermediary has signally failed to disappear! (An intermediary is normally a specialist in the art of searching large text databases, perhaps but not necessarily with some subject knowledge. )

The impact of experimental work on the study of search strategy has not been very great. However, there was one experimental result from the 1960s that encapsulated, in a surprising way, the problems subsequently addressed by the ASK hypothesis and that was seminal in our understanding of the search strategy problem. It is therefore worth describing this result.

The experiment involved the Medlars Demand Search Service and the National Library of Medicine (NLM) in the United States [6]. At that time, in order to conduct a search on a medical topic, the user would have to communicate with the NLM, either directly or via a local expert. Local experts existed at various places around the world to help users make the best use of the service.

The requests that were collected for the experiment could be divided into those where the user talked face-to-face with an expert and those where the user wrote a letter requesting a search. The expectation was that the face-to-face communication would be beneficial to the search by enabling the development of a better search formulation. However, in the event, the reverse was the case: on average, the letter-based requests performed slightly better than the face-to-face ones.

The experimenters' explanation of this result, after studying the data, was to suggest that users often came to face-to-face meetings without clearly verbalized requests, and that intermediaries tended to suggest formulations that were easy in system terms. The letter writers, on the other hand, were forced to articulate their needs more systematically before encountering the constraints of the system.

This result made a clear link between the Taylor/Belkin view of information retrieval and the practical problems of searching and had a major impact on the training of search intermediaries.

The relation between Boolean searching and facet analysis is a simple one: an analysis of the search topic from a facet viewpoint fits naturally with a canonical form of search statement, as follows:

(A1 or A2 or . . . ) and (B1 or B2 or . . . ) and . . .

Here the separate facets (or component concepts of the topic) are A, B. . . ., and each search term A1 is one of the ways of representing concept A. A number of "intelligent" front-end systems for helping users to construct searches assume this canonical form. It is, however, an extremely limited form if interpreted literally: for example, it fails to allow that one of the Ais might itself be a phrase or a combination of concepts.