|Date||Thu, 16 Oct 2003 11:55:55 +1300|
|Subject||Re: [greenstone-users] GAPlug problem|
|> It turns out that badly formatted titles are being extracted by
> PDFPlug/pdftohtml. This causes GAPlug to fail when parsing the generated
> doc.xml files.
> Unfortunately I don't have time to investigate this further now (perhaps
> is fixed in the latest version of pdftohtml?), but there is an easy way to
> get around the problem. Adding the "-no_metadata" option to PDFPlug will
> it doesn't try to extract any metadata, and the doc.xml files will be
> You can add Title metadata yourself using metadata.xml files if you want
> (extracted metadata is often poor quality anyway).
...it's a problem with XPDF, on which the pdftohtml plugin is based. I'm
working on a fix/workaround, but it will take me a day or two...