Relevance Feedback With Too Much Data
Department of Computer Science
University of Massachusetts
Amherst, MA 01003-4610
Modern text collections often contain large documents which span several subject areas. Such documents are problematic for relevance feedback since inappropriate terms can easily be chosen. This study explores the highly effective approach of feeding back passages of large documents. A less-expensive method which discards long documents is also reviewed and found to be effective if there are enough relevant documents. A hybrid approach which feeds back short documents and passages of long documents may be the best compromise.