Re: [greenstone-devel] Images not imported in collection constructed with GLI

From Michael Dewsnip
DateMon, 22 Sep 2003 12:40:02 +1200
Subject Re: [greenstone-devel] Images not imported in collection constructed with GLI
In-Reply-To (3F6A3A54-8CD1FBA7-ns-uca-edu-ni)
Hello Mauricio,

Greenstone relies on the external program "wv" (http://wvware.sourceforge.net) to convert Word documents into HTML for indexing. It usually does a pretty good job, but occasionally has problems.

These are unlikely, but you should check that:

- You have "plugin WordPlug" in your collect.cfg file (I'm pretty sure you do).
- The Word documents are not Word XP documents (the wv converter currently doesn't support these).
- The documents are real Microsoft Word documents, not RTFs or some other format.

Near the start of the year, Stefan Boddie had this to say regarding problems importing Word documents:

You might notice you get some broken images appearing in the html output 
Greenstone produces after converting and importing your MS-Word file. This 
is most likely because the image appearing within the Word file itself is a 
WMF image. Greenstone doesn't include support for extracting these images so 
they end up broken. Since a popular way to use Greenstone is to use the 
extracted text for indexing but retain the source Word document for display 
I haven't considered this a huge problem. The wvWare converter itself is 
however quite capable of extracting these images. To do so requires libwmf 
and various other components, the inclusion of which would make Greenstone 
even bigger and slower to download than it is already. Those requiring this 
feature can download the required components themselves however. The latest 
versions of everything required can be found at http://www.wvware.com. You 
simply need to install the new binary files into your gsdlbinwindows (or 
gsdl/bin/linux or whatever) directory, replacing the wvWare binary that's 
already there if required. For Windows users there are pre-compiled binaries 
of wvWare.exe, libwmf, and everything else you need at 
http://sourceforge.net/projects/gnuwin32.


If this fails, you can always use the HTML version for indexing, but display the real Word document to your users (as Stefan mentions). This is easy to do, and is done by the Word and PDF demo collection (http://www.nzdl.org/cgi-bin/library?a=p&p=about&c=wrdpdf-e), for example.

Hope this helps,

Michael
 
 

Mauricio Garcia wrote:

Hi: We are constructing a collection from word documents as source
documents, this documents have images, but are not imported in the
collection.

Which plug-ins and what configuration shall I use?

thanks

Mauricio Garcia
UCA - Nicaragua

_______________________________________________
greenstone-devel mailing list
greenstone-devel@list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-devel