|You could try OpenOffice.
It does pretty well at a variety of conversions in my experience.
On Mon, 21 Mar 2005 21:54:38 -0800 (PST), Leho@nq <firstname.lastname@example.org> wrote:
> Dear John R. McPherson,
> I agree with you that this isn't the problem of Greenstone. It's the tools
> like pdftohtml or wvware
> that has some limits. But I tested and GS worked well with Web archive! Even
> in the
> gsdl-workshop-materials workshop (1 day and 3 days - that I got from
> greenstone.org) they gave us some web archives (html_large and html_small)
> to test, and it's OK!
> I upload my test collection with test.doc, test.pdf and test web archives in
> www.yousendit.com so that someone will be able to experience with it!
> If someone's compiled wvware on Windows that supports extracting embedded
> images like Drawing Object, please share me and everyone! Thank you so much!
> "John R. McPherson" <email@example.com> wrote:
> We use the pdftohtml program for converting pdf files, and we use the
> wvWare program for converting MS Word documents. If they can't extract
> images from the input document, then greenstone can't process them.
> wvWare can extract embedded images, but by default it cannot extract
> "Drawing objects". It is possible to compile a version of wvWare with
> support for these, but it is difficult to do this on windows because
> it requires linking in a lot of extra code libraries that were
> designed to work on unix (eg libwmf, and the truetype font library).
> I don't think there is a greenstone plugin that can handle that
> those "web archive" files.
> Can you put one of the pdf files on a publicly accessible web server
> and maybe someone will be able to take a look at it.
> John McPherson
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> greenstone-devel mailing list
Stephen De Gabrielle