We have a newspaper archive in PDF files on Greenstone. To make
searching in the extracted text easier, we have partitioned the search
indexes in years from 1985 to 2010 (26 partitions). However, only PDF
files for 1985-1990, and 2009 and 2010 have been added up to now. The
problem is that searching is not working in 2010 files, but all the
others gives results. We use Lucene as indexer.
Could there be a limitation on the number of partitions causing this
problem. When text is extracted with pdftohtml for the 2010 years, there
is no error message that the file could not be processed.
Any idea what the problem could be?
Regards and thanks
P O Box 30664, WIndhoek, Namibia