Re: [greenstone-users] Hierarchical Metadata Precedence

From Michael Dewsnip
DateWed, 22 Aug 2007 09:23:13 +1200
Subject Re: [greenstone-users] Hierarchical Metadata Precedence
In-Reply-To (aa483ce60708180657p52ef0c1dhf26d68a706a30bee-mail-gmail-com)
Hi Murithi,

I think you will need to change the code of the Hierarchy classifier to allow commas to be top-level separators; I don't think you can do this with the separator regular expression.

A much easier way to achieve what you want is to add two metadata values for the same metadata element:

    Striga/Stem borers/Maize/Biology
    Entomology

Make sure you have specified the "-allvalues" option the Hierarchy classifier.

Regards,

Michael



Murithi Ones wrote:

Hi

I'm a relatively new Greenstone User from Kenya.

In the short time I've used Greenstone; (around three weeks) I have been very impressed by the indexing and classification that Greenstone offers.

At the outset of the Project I am undertaking, I had a few hiccups with the Procite plugin but I solved that after exporting the Procite Data to Comma Separated Values. The CSV plugin comfortably processed the more than five thousand records and the classifiers worked as expected.

All was well till I came started dealing with Hierarchical classification, not from a hfile but from structured metadata.

The separator regular expression worked like a charm in extracting individual values from the metadata.

However, I have tried to no avail, to modify the classifier to detect the hierarchical nature of the metadata itself based on the type and sequence of separators.

Though I haven't given up, I would appreciate any comments regarding this problem. Maybe I have overlooked something really basic. I have used regular expressions before and got the idea of the separator structure immediately.

Example

Separator regular expression

[\,|\/]

 

Metadata Example

Striga/Stem borers/Maize/Biology, Entomology

In short this means that the record under discussion can fall under Entomology or under Biology/Maize/Stem Borers/Striga. The comma should evaluate to a top level separator while the forward slash should act as a hierarchical classifier within the groups extracted from the comma separated values.

My problem is how to instruct the Hierarchical classifier to treat the comma as a higher precedence separator than a forward slash.

In this case Entomology and Striga should be hierarchically superior to Stem Borers.

Presently, the classifier considers the hierarchy to be structured in order of precedence from Striga to Entomology which is not correct. Clicking on Entomology should return a list of articles with Entomology as a Keyword at the top level and other levels falling under Entomology in other records.

My suspicion is that a change in the structure of the regular expression should do the trick and this is the path I'm currently pursuing. I think my next step will be to try and nest the regular expressions.

Could you be of help? A pointer in the right direction would be highly appreciated.

Kindly reply,

Murithi Borona

Nairobi, Kenya


_______________________________________________ greenstone-users mailing list greenstone-users@list.scms.waikato.ac.nz https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users


-- 
DL Consulting
Greenstone Digital Library and Digitisation Specialists
contact@dlconsulting.com
www.dlconsulting.com