Re: [greenstone-users] Document images and sections

From Katherine Don
DateMon, 08 May 2006 09:26:50 +1200
Subject Re: [greenstone-users] Document images and sections
In-Reply-To (445B23C9-30102-caboose-org-uk)
Hi Kevin

Kevin O'Rourke wrote:
> Hi,
> our library staff here have started trying out Greenstone and looking at
> the example collections. I've given them the various Greenstone manuals
> and they're experimenting with adding documents and configuring collections.
> They've immediately started finding things in samples and Greenstone
> systems on the internet that they want to do but don't know how.
> 1. The cover images that are in the 'demo' collection. I see that these
> are enabled by "format DocumentImages true" but I can't find out where
> the images come from. Does it just look for a file with a particular
> name or is there some way of specifying the filename to use?
By default, a jpg image with the same name as the file will be
associated as a cover image. For example, for a file called
"Software Engineering.doc", the cover image needs to be named
"Software Engineering.jpg".
The cover_image option that Stephen mentioned is no longer valid.
Greenstone will try to associate covers by default, to turn it off, use
the -no_cover_image plugin option.

> 2. The hierarchical list of document sections, again as found in the
> 'demo' collection. Does this only work for HTML documents?
> I see from the documentation that it requires adding HTML comments with
> the section titles, why can't this information be extracted
> automatically from the HTML headings? There's also a vague mention of
> the same formatting working for Word files but it didn't work for me,
> can't the Word plugin extract this automatically?
The old way of doing sections is to mark up the document using HTML
comments, as described in the "tagging document files" section of the
developers guide. The works for HTML, Word and PDF I think. You need to
set the -description_tags option for the plugin.

Recently there are other methods depending on the document type.

For HTML documents, you can use -sectionalise_using_h_tags option, which
will split the document into sections based on H1, H2, H3 tags.

For PDF documents, you can use the -use_sections option, to generate one
section per page.

For Word documents, if you are working on windows, you can try the
-windows_scripting option. This uses Word to convert to HTML, rather
than a third party program. If the Word document has been styled with
Heading 1, Heading 2, Heading 3 styles, these will be used to split the
document into sections.
If you have used different styles for headings, then you can use the
level1_header, level2_header, level3_header options - these options
specify which style types should be used as section breaks.

There are several tutorials dealing with these things, but they are not
on line yet. If you want, I can send you a copy of them.
I'll try to get them online over the next few weeks.

> Kevin
> _______________________________________________
> greenstone-users mailing list