Re: [greenstone-devel] PDFPlug vs pdftohtml encoding scheme

From John R. McPherson
DateThu, 20 Nov 2003 21:31:40 +1300
Subject Re: [greenstone-devel] PDFPlug vs pdftohtml encoding scheme
In-Reply-To (000701c3af38$5965a990$50c8a8c0-Odin)
On Thu, Nov 20, 2003 at 06:31:44PM +1100, Franck Magron wrote:

> But in fact the -enc option of pdftohtml.pl is not used :
> my $return_value=system("$ppthtml_binary "$input_ppt" >
> "$output_html"");

I think you've accidentally looked at the wrong file.... that
is in the ppttohtml.pl script, which converts powerpoint (ppt).

The pdftohtml.pl script has the following:
$cmd .= " -noframes -p -enc UTF-8 "$input_filename" "$output_filestem.htm'"";

which should do the right thing. I'm not sure why it gives the warning
message though... perhaps you could email me (personally, not the list)
a copy of a pdf that exhibits this behaviour? pdftohtml.pl definitely
works on non-ascii characters for the documents that we've tested with.

John McPherson