|Date||Sat, 18 Aug 2007 16:57:23 +0300|
|Subject||[greenstone-users] Hierarchical Metadata Precedence|
I'm a relatively new Greenstone User from Kenya.
In the short time I've used Greenstone; (around three weeks) I have been very impressed by the indexing and classification that Greenstone offers.
At the outset of the Project I am undertaking, I had a few hiccups with the Procite plugin but I solved that after exporting the Procite Data to Comma Separated Values. The CSV plugin comfortably processed the more than five thousand records and the classifiers worked as expected.
All was well till I came started dealing with Hierarchical classification, not from a hfile but from structured metadata.
The separator regular expression worked like a charm in extracting individual values from the metadata.
However, I have tried to no avail, to modify the classifier to detect the hierarchical nature of the metadata itself based on the type and sequence of separators.
Though I haven't given up, I would appreciate any comments regarding this problem. Maybe I have overlooked something really basic. I have used regular expressions before and got the idea of the separator structure immediately.
Separator regular expression
Striga/Stem borers/Maize/Biology, Entomology
In short this means that the record under discussion can fall under Entomology or under Biology/Maize/Stem Borers/Striga. The comma should evaluate to a top level separator while the forward slash should act as a hierarchical classifier within the groups extracted from the comma separated values.
My problem is how to instruct the Hierarchical classifier to treat the comma as a higher precedence separator than a forward slash.
In this case Entomology and Striga should be hierarchically superior to Stem Borers.
Presently, the classifier considers the hierarchy to be structured in order of precedence from Striga to Entomology which is not correct. Clicking on Entomology should return a list of articles with Entomology as a Keyword at the top level and other levels falling under Entomology in other records.
My suspicion is that a change in the structure of the regular expression should do the trick and this is the path I'm currently pursuing. I think my next step will be to try and nest the regular expressions.
Could you be of help? A pointer in the right direction would be highly appreciated.