As the amount of on-line text has increased, so has the size of individual documents in those collections. Information retrieval methods that could easily be applied to the full text of abstracts or short documents are sometimes less effective or prohibitively expensive for large documents. This problem has led to a resurgence of interest in techniques for handling large texts, including passage retrieval, theme identification, document summarization, and so on.
Most work in this area has been done in the ad-hoc" setting, where retrieval is performed in the absence of known relevant documents. Surprisingly, little work has been done toward applying the same techniques to the information filtering or routing environment, where a collection of documents has been judged for relevance.
This study examines the value of using passages of long documents for feedback, even in the absence of information about which passage of a relevant document contains the relevant information. We also explore the intriguing idea of totally ignoring large documents, possibly saving a great deal of computational expense compared to passage handling.
All work in this study was performed using Inquery, a probabilistic information retrieval system based on an inference net model.[Tur90]
2 Experimental method
In relevance feedback, a query is combined with a set of documents whose relevance to the query is known, and a new|presumably more useful|query is created. If the new query is applied to the same collection of documents from which the training data was drawn, evaluation becomes