Re: Indexes

From Stefan Boddie
DateWed, 26 Feb 2003 13:33:26 +1300
Subject Re: Indexes
In-Reply-To (20030219011502-70142-qmail-web40412-mail-yahoo-com)
>
> I am a new user and whilst pawing my way through
> creating a collection of html documents pointing to
> web based software tools, I was wondering:
>
> Is it possible to create indexes other than the
> standard ones of section, document and paragraph? By
> using metadata for example? And if so, how do you tag
> the html documents?
>

Section, document, and paragraph refer to the "granularity" of a search
carried out on the index, not the content of the index itself. That is, you
can have a section level index of title metadata, a document level full text
index, a paragraph level full text index, a document level index of subject
metadata, etc.

As an example of what effect the different granularities have. Say you had a
collection with a full text document level index and a full text paragraph
level index. If you did a boolean search for "foo & bar" on the document
level index the search engine would retrieve all documents that contained
both "foo" and "bar", regardless of where those words occurred within the
document. If you did the same search on the paragraph level index you'd
retrieve only those documents that contained both the words within the same
paragraph.

At present the only supported granularities are document, section, and
paragraph. Note that paragraph is a special case and is only useful when
creating a full text index (because greenstone doesn't allow any way to
assign metadata at paragraph level). Paragraph indexes are occasionally
useful but generally you should stick with section level indexes.

Since greenstone represents documents hierarchically you can really create
any kind of section layout you like. For example, documents in the project
gutenberg collection (see http://nzdl.org) have a flat structure with a new
section for each page (so a section level index in this case is actually a
page level index). The various humanitarian collections have a more
hierarchical structure with nested sections representing chapters,
sub-sections etc.

Anyway, I'm not sure if any of this answers your question. Try looking again
at the greenstone developer's guide, particularly the sections on "Tagging
document files" and "Inside Greenstone archive documents".

Stefan.


> This might be a naive question, but I ve scanned
> through both the develop.pdf and the book and I cannot
> find anything.
>
> Thank you
>
> Melina Kalatzi
> Lecturer
> Middlesex University
> UK
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com
>