Re: converting from .doc to HTML

From Jayalakshmi-DS
DateMon, 1 Apr 2002 17:22:59 +0530 (IST)
Subject Re: converting from .doc to HTML
In-Reply-To (20020401230748-B30060-goblin-cs-waikato-ac-nz)
So the problem is with the Word version being used. A new release of
wvWare 0.7.1 is available which can handle word 2000 also. What are the
changes to be made in gsConvert.pl if this is to be used? Is this
possible?
-Jayalakshmi


On Mon, 1 Apr 2002, John R. McPherson wrote:

> On Mon, Apr 01, 2002 at 12:45:45PM +0530, Jayalakshmi-DS wrote:
> > Hi,
> > We have a problem converting .doc files to html. The WordPlug doesn't
> > seem to convert the docs even though the import process completes fine.
> > When we click on the icon in the A-Z list what we see is some gibberish ,
> > not in English.
> > We have tried using the _input-encoding option with ascii and utf
> > formats. It just doen't seem to extract the text properly. Thje .doc files
> > are saved in Word Document format with lnaguage encoding Western European
> > (Windows).
> > What could be the problem? Thanks in advance.
> > _Jayalakshmi
>
> Hi,
> if wvWare (the 3rd-party program we are using) can't successfully
> convert an MS Word file into HTML, then we basically run the unix
> "strings" command over the file, which basically extracts all
> printable characters, which is why you see the "gibberish".
>
> Perhaps your files were created using Word 2000 or Word XP? The
> converter can only handle MS Word versions 2 up to 97.
> The people who write wvWare do a good job, but unfortunately it
> is a moving target with proprietary formats like MS Word.
>
> Another possibility is that the files have a .DOC extension but
> aren't proper Word files. For example, you can give a plain text
> file or Rich Text Format (RTF) file a .DOC extension and MS Word
> handles this, although I think our import process will detect this
> and work if it is in fact RTF, although I don't know how many
> different formats this works for.
>
> John McPherson.
>

------------------------------------------------------------------------
D.S.Jayalakshmi
National Centre for Software Technology
68, Electronics City Phone: 80 - 852 3300
Bangalore - 561 229. Extn:2102
------------------------------------------------------------------------