Thanks for pointing this out. I guess we have never noticed before that this
document doesn't appear in the hierarchy. The problem is with the way the
hierarchy files work. Each line has the format : key, hierarchy position,
display value. When you are classifying documents based on say Subject, the
Subject metadata of each document is compared to the key values in the
hierarchy file (in this case sub.txt).
The problem with the document mentioned below is that the key for the
classification has ' in it, as does the value in the metadata.xml file.
However, the ' in the metadata.xml file gets turned into ' in the
document archive file, and therefore no longer matches the value in sub.txt.
If you put a ' in the sub.txt file in the key, then the classification fails
silently and you don't get any documents in the classification. Helpful isn't
We'll have to have a look at this, but for now the best thing to do is remove
the ' from the key in sub.txt and from the metadata.xml file. It is ok in
the value in sub.txt so the displayed string will still have the apostrophe.
I'm not sure how the GLI handles hierarchical metadata and sub.txt-type files,
so if you are using the GLI to create hierarchical classifications, and the
values have apostrophes, and you are not getting what intended, then check the
metadata.xml and hierarchy files by hand.
We'll make sure that the GLI works properly for the next release.
Hope this helps,
Jong Hann wrote:
> Hi John,
> You're right about the source documents included for the demo collection.
> It's a very useful starting point. We've gone over it for a few days now.
> Built a few test collections with the same structure.
> Question regarding the Subject organization in the demo collection...
> A particular source document doesn't seem to have been successfully filed
> into a designated Subject category.
> The source document in question is `wb34te.htm'.
> In fact, we can't even find the Subject category `Women, gender and
> development, women's organizations' when browsing via the Subject interface.
> Is there something we're missing?
> greenstone-users mailing list