Re: [greenstone-users] Folder of html documents - specifiying index (start) file

From Katherine Don
DateTue, 05 Oct 2004 13:21:09 +1300
Subject Re: [greenstone-users] Folder of html documents - specifiying index (start) file
In-Reply-To (4161B102-4070606-unesco-org-uy)
Hi Eduardo

Greenstone is not very good at handling documents that are made up of
more than one file. If you really want all the html files joined
together into one document, then you would probably need to write a new
plugin, or modify html plug.
This plugin would only process the index files, (you can specify which
ones these are using the process_exp option), but would need to go
through all the links and add the other pages into the main document.

When you build the collection at present, do all the links work ok? eg
if the user is looking at the index file, can they click the links and
get to all the other pages?

If so, heres my suggestion for a simpler solution:

Keep each file as a separate document. When you do a search (full text)
you will get individual pages in the results. I think that is ok,
because if you always return the index file, then the user sees a
document that may not even have the search terms in it.

Using metadata, you can make metadata searching and browse lists only
show the index files. Assign the metadata to the index file rather than
the folder. This way only the index will get that metadata. Then if you
create a classifier using that metadata, only the index files will
appear. And if you build an index on that metadata, only index files
will show up in the search results because only they have values for
that metadata.

I hope this is useful.
Katherine Don

Eduardo Trâ–¡pani wrote:
> Hi,
> Is there a way to tell Greenstone which html file is the index file (of
> a group of related html files in a folder). Sometimes it is index.html,
> but it might also be indice.html, start.html ...
> I hope to solve the other problem (telling Greenstone that that group of
> files are indeed a whole document wth single metadata) and I wonder if I
> can also say which one is the index file, so that only that one shows up
> and navigation goes on normally.
> Eduardo.
> I have several books, each of them consists of many html pages. There is a folder

for each book, but if I add the metadata to the folder, say Author, and
then search for

the author, I get a line for each html page!
> I want to get a single line with all the metadata from the folder, not the individual

> I know they inherit the metadata from the folder, but I just don't know how to solve

that problem ...
> Any ideas?
> Eduardo.