Re: [greenstone-users] RE:Greenstone 2.5: Images are not seen after importing from MS Word files

From John R. McPherson
DateTue, 06 Jul 2004 11:40:22 +1200
Subject Re: [greenstone-users] RE:Greenstone 2.5: Images are not seen after importing from MS Word files
In-Reply-To (BAY15-F37EQJi9AJpn800026682-hotmail-com)
Raitis Brodezhonok wrote:
> Hello Greenstone users and designers!
> This might be the useful information for those who make collection from
> MS Word doc. files with images inside and maybe not only in this case.
> I tried Greenstone 2.51 under OS MS windows installed in directory
> C:Program Filesgsdl.
> When collection was made I could not see images there.
> 1) if in source doc. there was hyperlink to an image file, then link
> was not accessible;

Hi, it depends on the image format and how it is stored. We use a 3rd
party program called "wvWare" to convert MS Word files into .html. It
can extract some images but not others.

> 2) if in source doc. there was inserted picture from a file, then during
> importing in Log file was observed double paths like C:Program
> FilesgsdlcollectdocfilestmpC:Program
> Filesgsdlcollectdocfiles mpD10.jpg and as result images were not
> accessible from collection.
> When I installed Greenstone into directory like C:gsdl , then the
> second problem mentioned above
> disappeared (the 1st one still stays).
> It looks the problem is about Paths with space inside...(C:Program
> Files) !!

Thanks for reporting this... looking at the code it looks like it
checks that links consist of numbers, letters, "."s and "/".
Could you look in your collection's etc/fail.log - I think it prints
out "HTMLPlug: ERROR - badly formatted tag ignored" for any link that
doesn't match that rule.

According to the standards (eg,
spaces aren't allowed in URIs/URLs (they should be escaped as %20), so
we need to track down whether greenstone or wvWare is at fault here, but
either way this will definitely be fixed for the next release of greenstone.

John McPherson

