From | George Buchanan |
Date | Thu, 16 Oct 2003 11:55:55 +1300 |
Subject | Re: [greenstone-users] GAPlug problem |
In-Reply-To | (3F8DB69F-E8EFED8C-cs-waikato-ac-nz) |
> It turns out that badly formatted titles are being extracted by
> PDFPlug/pdftohtml. This causes GAPlug to fail when parsing the generated > doc.xml files. > > Unfortunately I don't have time to investigate this further now (perhaps this > is fixed in the latest version of pdftohtml?), but there is an easy way to > get around the problem. Adding the "-no_metadata" option to PDFPlug will mean > it doesn't try to extract any metadata, and the doc.xml files will be valid. > You can add Title metadata yourself using metadata.xml files if you want > (extracted metadata is often poor quality anyway). > ...it's a problem with XPDF, on which the pdftohtml plugin is based. I'm working on a fix/workaround, but it will take me a day or two... |