|Date||Tue, 21 Sep 2004 11:09:44 -0200|
|Subject||Re: [greenstone-users] Some PDF files are imported but not shown in the listings|
|> The most common reason for a file to be imported but not included in the
> built results is if the imported file contains badly encoded characters.
> For PDF files, this often happens if the 3rd party pdftohtml program we
> use can't successfully extract the text, and instead extracts binary
Yes, that seems to have been the problem, not that the characters were badly encoded, but the fact that the PDF plugin tried to read utf8, what meant that the characters were no longer valid. I changed the input encoding of the PDF plugin from auto to iso_8859_1 and the files now show up in a new collection.
But the old collection, even after rebuilding, still does not show them. Maybe once the metadata is wrong you have to get rid of it (or edit it) before the collection can be safely rebuilt ...
PD: By the way, the documentation is great! Yesterday I was reading the developer's manual, and that's how I found out that problem, by following the manual steps to import and build a collection. Great work.