Re: [greenstone-devel] bug in import of ascended filenames?

From John R. McPherson
DateFri, 27 Jun 2003 18:21:17 +1200
Subject Re: [greenstone-devel] bug in import of ascended filenames?
In-Reply-To (3EFBD9B9-9040300-cs-waikato-ac-nz)
John R. McPherson wrote:
> roman chyla wrote:
>> Hi,
>> yesterday I made some interesting test. I saved 4 word files; the
>> first four had ascended characters in filename ; the second group had
>> only ascii7 chars in filename. The content of both groups were in
>> non-ascii encoding.
>> all 8 files were imported
>> the first 4 files (with ascended chars) were rejected during building

> Hi,
> I don't think we've ever checked the filenames, so the code is assuming
> that it gets UTF-8.

> Thanks for pointing this out. Sorry I can't suggest a fix or a
> work-around...

Ok, can you please try the following?

In perllib/plugins/, at around line 496 there is a line like:

$doc_obj->add_utf8_metadata($doc_obj->get_top_section(), "Source", &ghtml::dmsafe($filemeta));

Change this to say

$doc_obj->add_metadata($doc_obj->get_top_section(), "Source", &ghtml::dmsafe($filemeta));

(ie, without the utf8 in the function name). This will assume that the
filename is in extended ascii and will then convert it to utf8. Even if
your filename isn't extended ascii, it will be converted to utf8 and at
least it will import, although the filename might be displayed incorrectly.

Similarly in, at around line 222, change
$doc_obj->add_utf8_metadata($cursection, "URL", $web_url);
$doc_obj->add_metadata($cursection, "URL", $web_url);

Any plugin that is based on HTMLPlug or BasPlug should then get it right,
unless they use their own read () function for reading in the file.

John McPherson