Re: [greenstone-users] GAPlug problem

From George Buchanan
DateThu, 16 Oct 2003 11:55:55 +1300
Subject Re: [greenstone-users] GAPlug problem
In-Reply-To (3F8DB69F-E8EFED8C-cs-waikato-ac-nz)
> It turns out that badly formatted titles are being extracted by
> PDFPlug/pdftohtml. This causes GAPlug to fail when parsing the generated
> doc.xml files.
>
> Unfortunately I don't have time to investigate this further now (perhaps
this
> is fixed in the latest version of pdftohtml?), but there is an easy way to
> get around the problem. Adding the "-no_metadata" option to PDFPlug will
mean
> it doesn't try to extract any metadata, and the doc.xml files will be
valid.
> You can add Title metadata yourself using metadata.xml files if you want
> (extracted metadata is often poor quality anyway).
>
...it's a problem with XPDF, on which the pdftohtml plugin is based. I'm
working on a fix/workaround, but it will take me a day or two...