| Hi
There was a bug in the plugin where if input_encoding was set, then
HTMLPlug didn't get passed the input_encoding option correctly.
Please find the following code (line 155) in perllib/plugins/WordPlug.pm
# wvWare will always produce html files encoded as utf-8
if ($self->{'input_encoding'} eq "auto") {
$self->{'input_encoding'} = "utf8";
$self->{'extract_language'} = 1;
push(@$html_options,"-input_encoding", "utf8");
push(@$html_options,"-extract_language");
}
Move the line
push(@$html_options,"-input_encoding", "utf8");
to outside the if statement, i.e.
# wvWare will always produce html files encoded as utf-8
if ($self->{'input_encoding'} eq "auto") {
$self->{'input_encoding'} = "utf8";
$self->{'extract_language'} = 1;
push(@$html_options,"-extract_language");
}
push(@$html_options,"-input_encoding", "utf8");
And hopefully this will work. You'll need to reimport and rebuild the
collection. If it doesn't work, please let me know.
Regards,
Katherine
Cao Minh Kiem wrote:
> Dear Michael, Katherine and Greenstone Developer Team,
>
> First of all, I wish you Merry Chrismas, Happy New Year and more success for
> GSDL project.
>
> Meanwhile, I would like to report some problem of Wordplugin of GSDL Version
> 2.62.
>
> I have just installed GSDL software version 2.62 and found that WordPlugin
> seemed not to work properly. It does not convert correctly characters MS
> word .doc file into HTML. In my case, the Word DOC file is in UNICODE.
> WordPlugin of version 2.60 works fine.
> If the DOC file is saved in HTML format, it is OK (because it is processed
> by HTML Plugin.
> Could you give me some tips and advices to solve the problems?.
>
> I send you some word file and its HTML file (created by Word) for testing.
> Thank you for wonderful works.
> Best regards
> Cao Minh Kiem
> Deputy-Director
> National Center for S&T Information
> 24 Ly Thuong Kiet, Hanoi. VIETNAM
> Tel: (84-4)-9349491. Fax (84-4)-9349127
> Email: kiemcm@vista.gov.vn
>
>
>
> |