Re: [greenstone-users] Folder of html documents - specifiying index (start) file

From Eduardo TrĂ¡pani
DateTue, 05 Oct 2004 12:20:44 -0200
Subject Re: [greenstone-users] Folder of html documents - specifiying index (start) file
In-Reply-To (4161E8F5-1010004-cs-waikato-ac-nz)
Hi Katherin,

> Greenstone is not very good at handling documents that are made up of
> more than one file. If you really want all the html files joined
> together into one document, then you would probably need to write a new
> plugin, or modify html plug.
> This plugin would only process the index files, (you can specify which
> ones these are using the process_exp option), but would need to go
> through all the links and add the other pages into the main document.

That's a good idea. It will take some time, but it looks interesting.

> Keep each file as a separate document. When you do a search (full text)
> you will get individual pages in the results. I think that is ok,
> because if you always return the index file, then the user sees a
> document that may not even have the search terms in it.

Yes, you're right.

> Using metadata, you can make metadata searching and browse lists only
> show the index files. Assign the metadata to the index file rather than
> the folder. This way only the index will get that metadata. Then if you
> create a classifier using that metadata, only the index files will
> appear. And if you build an index on that metadata, only index files
> will show up in the search results because only they have values for
> that metadata.
>
> I hope this is useful.

You bet! Thanks a lot, that's it, that's what I've done and it works. It was a great idea to assign the metadata just to the index file. You find the document with the right search and start browsing it right away.

Thanks again, Eduardo.