Cover Image
close this bookExpanding Access to Science and Technology (UNU, 1994, 462 pages)
close this folderSession 3: New technologies and media for information retrieval and transfer
close this folderInformation retrieval: Theory, experiment, and operational systems
View the document(introductory text...)
View the documentAbstract
View the document1. Scientific communication and information retrieval
View the document2. Anomalous states of knowledge
View the document3. Relevance
View the document4. Early experiments in IR
View the document5. Language
View the document6. Boolean logic, search strategy, and intermediaries
View the document7. Associative methods
View the document8. Probabilistic models
View the document9. Information-seeking behaviour
View the document10. Intelligence
View the documentReferences

5. Language

Received wisdom in the 1950s was that IR required some kind of formal indexing or coding scheme. (Exactly what kind of scheme was one of the topics of endless debate.) Thus, items required indexing/coding in terms of the scheme, which probably in turn required a human indexer (though a machine might be taught to do it). A similar process was required at the search stage, in respect of the query or need, though an end-user might possibly learn enough about the formal scheme to conduct a satisfactory search.

The results of the early experiments, together with the developing technology and changing perceptions of how it might be used, caused a backlash against this received wisdom. It became feasible to throw into the computer larger and larger quantities of text, and retrieve on the basis of words in text rather than assigned keywords or codes. At first sight, the necessity for any kind of indexing scheme, at either end of the process, seems to disappear: the user can use "natural" language to search a "natural" language database, without any interference from librarians.

We have since come to a much more balanced view of language, though the debate continues to generate new questions as we develop our highly interactive systems. It is clear that natural language searching is a powerful device that can often produce good results economically. However, it places a large burden on the searcher, and besides, certain kinds of queries are not well served. In recognition of these points, many modern databases include both formal indexing and searchable natural language text.

Formal artificial languages (in which category I include library classification schemes) represent particular views of the structure and organization of knowledge. One idea that emerged from the analysis of such languages, and that is central to modern indexing languages as well as to the practice of searching, is that of the facet. Once it is recognized that topics and problem areas are potentially highly complex, it becomes essential to approach the problem of describing them via different aspects, or facets, and combining the resulting descriptions in a building-block fashion [8]. (The idea of a faceted classification scheme, while originally due to Ranganathan in the 1930s, was put in its most concise form by Vickery; B.C. Vickery, Faceted Classification, London: Aslib, 1960.)

Many modern indexing languages, while not necessarily following the rules of faceted classification, reflect an essentially facet-based approach to the organization of knowledge. But the approach also has value at the searching stage, whether or not the database being searched is indexed by such a language. This theme is taken up again below.