Re: [greenstone-users] joint indexes with mgpp

From jens wille
DateTue, 11 May 2004 20:01:54 +0200
Subject Re: [greenstone-users] joint indexes with mgpp
In-Reply-To (40A05D7C-6050701-cs-waikato-ac-nz)
hi katherine,

Katherine Don wrote:
> Yes. The field searching functionality arises from the fact that
> everything is in one physical index. We don't allow by default
> more than one physical index for mgpp.
>
> For what you want to do, you don't need to have separate physical
> indexes. The mgpp user guide
> (http://www.greenstone.org/docs/mgpp_user.pdf if you can't find
> it in your greenstone download) talks a little about the mgpp
> document format.
>
> The format is like <Document> <Section> <Tag>text content</Tag>
> other non field content </Section> </Document>
>
> You just need to modify the output form the perl into mgpp so
> that several fields get combined into one tag. currently each
> metadata gets its own tag when its passed to mgpp - you need to
> put all the contents of the metadata you want to be indexed
> together into a single tag.
>
> For example, if you have 4 metadata elements, A1, A2, T1, T2 and
> you want individual searching over each, and combined searching
> over the Ts and over the As, then what you want to pass to mgpp
> is something like the following
>
> <T1>my first title</T1> <T2>my second title</T2> <TAll>my first
> title my second title</TAll> <A1>author 1</A1> <A2>author 2</A2>
> <AAll>author 1 author 2</AAll>
>
> Then in your index list you will get T1, T2, TAll, A1, A2, AAll
when i read the respective passage in mgpp_user a few days ago, this
possibility came to my mind, too, but it's a bit too much overhead,
isn't it...

let me just try to summarize it:
a) modify the perl code to allow indexes over multiple metadata
b) modify the source documents' metadata.xml so that they already
contain proper metadata

ad a) this seems a bit hard, especially if one considers that mgpp
is designed to work with _one_ physical index. this might imply to
loose the advantages over mg or - even worse ;-) - to work through
the whole code to implement a feature "multiple indexes" - if this
is possible at all.
(i have to admit that i'm not very confident of getting it done on
my own, but nonetheless it's utterly interesting and i will at least
give it a try - in fact i'm already on it, i just have to learn some
perl beforehand ;-). additionally, it may give some deeper insights
on how indexes work - and this is what i'm foremost up to!)

ad b) well, this is kind of "quick 'n' dirty" (although not so
quick, possibly) and may not always work. in my case it is not such
a big problem, because i already have a script running over my
source documents to create the metadata.xml, but at the moment this
seems to be the best way.

on the whole it leaves me bit disappointed, that it's not possible
with mgpp :-(

but now that i know the possible ways out, i will make it work
*somehow* ;-)

best regards

jens