Re: Formatting Issues of Documents

From John R. McPherson
DateFri, 21 Mar 2003 13:34:20 +1200
Subject Re: Formatting Issues of Documents
forwarded to the list.

someone wrote:

> I had some ideas for the hyperlinks problem;
>
> * You could try using the MS-word 'Save as HTML' feature on the
> documents. (Maybe even run a HTML cleaning utility like the one in
> dreamweaver or HTML tidy before the import). As there is no batch
> processing facility in ms word this could get very tedious. (or convert
> them all to PDF- the ones I looked at weemed to work well-but I don't
> know if this could be automated)
> * The other option would be to write a script [or use the dreamweaver
> search/replace facility - it works on multiple files] to fix it up
> after greenstone does the conversion in the import phase. This is
> assuming you get the actual url next to the offending 'HYPERLINK' text.
> (you could just strip the missing ones)
>
> Other options I can think of include;
>
> * 'save as' an older version of word or RTF and see if that fixes it.
> (tedious)
> * not converting the documents, just using greenstone to index them. use
> [srclink] in your format statements - If your users all have ms word it
> might be ok.
>
> What do you think?

I think these are excellent ideas - I've forwarded them to the
list!

> I sometimes feel reluctant to post to the list, as I am not an expert, but I
> feel it is appropriate to contribute in some way, given the greenstone team,
> seem to cop the task of answering a lot of questions.

Greenstone is a *user's* mailing list. I think I speak for the whole
development team when I say we'd love to see more interaction between
users (and developers), especially if problems and solutions are shared
on the list. Even (maybe especially) if you don't think you're an expert,
by replying to other questions you show that other users have encountered
similar problems, and that you have (or haven't) been able to overcome
them.

Speaking of html tidy (http://tidy.sourceforge.net), this is an
excellent utility for cleaning up and creating valid HTML / XHTML files,
and removing the junk created by Microsoft Office html converters.

I think it would be a very valuable tool for people who are creating
custom plugins, or even integrated into greenstone's import process
(hint hint to the developers out there).

John McPherson