Re: adding .mht files to collections

From John R. McPherson
DateThu, 11 Apr 2002 22:56:36 +1200
Subject Re: adding .mht files to collections
In-Reply-To (Pine-LNX-4-33-0204111343480-28248-100000-trinetra-ncb-ernet-in)
On Thu, Apr 11, 2002 at 01:54:24PM +0530, Jayalakshmi-DS wrote:
> Hi All,
> We have gsdl on Linux. We have a system wherein one can upload files
> saved on his/her machine to our Linux server and then a script running
> as a cron job imports the documents and builds the collection. This is
> working fine except for html documents. Users will have saved web pages in
> .mht ( web archive single file ) format. To accomodate these I have
> included .mht in process_exp for HTMLPlug. So .mht files are added to
> the collection. But the links in the doc are broken and images in the
> document are displayed as text. How can I overcoem these problems and
> successfully add .mht files to the collection?

HTMLPlug only handles "plain" HTML files. The web archive files
aren't HTML - they contain binary data and other stuff such as
server info and date info, etc. What program is creating them -
is this using Internet Explorer and doing "Save As..." then
Complete Web Page or something??

John McPherson