Re: getting intersection of classifiers

From Stefan Boddie
DateThu, 4 Jul 2002 13:14:02 +1200
Subject Re: getting intersection of classifiers
In-Reply-To (OFF2FA98F9-66D53D61-ON85256BE4-006E5CF7-altarum-org)
Hi Steve,

> We would like to present a table, with the rows and colums being the small
> set of values from two different metadata. The cells in the table would
> be the number of documents which match both metadata values, which is also
> a link to a page listing those documents.
> What I have tried so far:
> Defined metadata fields Publisher and Subject, using the metadata.xml
> file, working!
> Defined a metadata field 'PubSub' which combines an abbreviated Publisher
> and Subject.
> Defined a Hierarchy classifier for this combined field:
> classify Hierarchy -hfile pubsub.txt -metadata PubSub -sort Title
> -buttonname Publishers/Subjects
> In the pubsub.txt file, lines like:
> GAO 1 "US GAO"
> GAO-EDI 1.1 "EDI"
> GAO-LOG 1.2 "Logistics Support"
> RAND-EDI 2.1 "EDI"
> RAND-LOG 2.2 "Logistics Support"
> This is working almost like we want -- we can browse down through the
> classifier, to get to a list of the intersecting documents. And by using
> the [numleafdocs] can also list the number of documents under each
> classification combination. So we could hard-code a table which jumped to
> these sub-level classifier results and be close. But would need to hard
> code the number, as the [numleafdocs] would not be available until getting
> down into a specific sublevel.
> Question, is this on the right track, or is there an easier way to do it?
> If we set up indexes on both of these metadata fields, is there some way
> to do a search rather than a browse and get back all documents with these
> two specific meta-data values? Also is there a way to pull back the
> number of documents meeting these conditions?
> So far I've been trying the formats and classifier types, but maybe I
> should instead look into the runtime arguments that are available.
> Thanks for any ideas. If this isn't supported easily, the normal
> classifier browsing should work ok to discover the documents. But want to
> come as close as possible to the requested design before telling them it
> won't work that way.
> Steve Brophy
> Altarum, Ann Arbor, MI

There's no easy way I can think of to do this with a standard Greenstone
installation. Using mgpp (an experimental search engine included with
Greenstone but not used by default) you could create an index for each type
of metadata then search for documents with specific metadata values. Not
sure how much work that would involve though.