Re: pdf to html errors

From John R. McPherson
DateWed, 30 Oct 2002 09:51:31 +1300
Subject Re: pdf to html errors
In-Reply-To (20021029130701-31097-qmail-webmail29-rediffmail-com)
Bhakti P Beke wrote:
> Hi,
> I have also come across similar problem. I am working on GSDL 2.38
> version. Though I have submitted a query from Greenstone Support
> website I am still waiting for the rely.
> Bhakti

> On Tue, 29 Oct 2002 Marie-Jose Quintard wrote :
> >Goo dafternoon,
> >
> >I have created a new collection and have imported a few books
> >which are
> >available in pdf format (scanner used = HP9100C digital sender)
> >When I add these books to the new collection, a lot of error are
> >generated
> >when the pdftohtml is executed
> >and most of these books can't be converted to HTML.
> >
> >Any solutions, suggestions to solve this problem would be
> >welcome.
> >Thanks,

it would help if you told us things such as which operating system you
are using, and what kind of error messages you are getting. Otherwise
it is too difficult for us to know what is going wrong.

Having said that, we are aware of some problems when converting pdf
files on some versions of microsoft windows, although I'm not sure if
we have yet come up with a 100% fool-proof work-around yet. Windows NT,
2000 and XP seem to not see these problems as often as windows 95 and 98

Also remember that some pdf files don't actually contain text, but contain
images of text, and these pdf files cannot have the text easily extracted
from them.

John McPherson