[greenstone-devel] MS Office Drawing Object, again

DateMon, 21 Mar 2005 21:54:38 -0800 (PST)
Subject [greenstone-devel] MS Office Drawing Object, again
Dear John R. McPherson,
I agree with you that this isn't the problem of Greenstone. It's the tools like pdftohtml or wvware
that has some limits. But I tested and GS worked well with Web archive! Even in the
gsdl-workshop-materials workshop (1 day and 3 days - that I got from greenstone.org) they gave us some web archives (html_large and html_small) to test, and it's OK!
I upload my test collection with test.doc, test.pdf and test web archives in
www.yousendit.com so that someone will be able to experience with it!
If someone's compiled wvware on Windows that supports extracting embedded images like Drawing
Object, please share me and everyone! Thank you so much!

"John R. McPherson" <jrm21@cs.waikato.ac.nz> wrote:

We use the pdftohtml program for converting pdf files, and we use the
wvWare program for converting MS Word documents. If they can't extract
images from the input document, then greenstone can't process them.

wvWare can extract embedded images, but by default it cannot extract
"Drawing objects". It is possible to compile a version of wvWare with
support for these, but it is difficult to do this on windows because
it requires linking in a lot of extra code libraries that were
designed to work on unix (eg libwmf, and the truetype font library).

I don't think there is a greenstone plugin that can handle that
those "web archive" files.

Can you put one of the pdf files on a publicly accessible web server
and maybe someone will be able to take a look at it.

John McPherson

