Re: [greenstone-users] Extracted Subject terms from HTML

From Katherine Don
DateThu, 25 Nov 2004 09:43:30 +1300
Subject Re: [greenstone-users] Extracted Subject terms from HTML
In-Reply-To (41A3B8FF-5050107-telus-net)
Hi Jenn

Can you check the archives files (doc.xml) - is all the Subject metadata
actually making it into the archive? If not, then it is a plugin
problem. If it is, then it is a classifier problem.

In your subject list, if a section has 2 subjects, does it get included
twice, but with both entries showing the last subject, or does it get
included only once?

If you like, you can send me your configuration file and one or two
documents (off the list) and I can take a look for you.


Jenn Cole wrote:
> Thanks Katherine,
> For some reason either options you gave isn't working. It is still only
> displaying the last subject term. If I manually type in the subject
> metadata in the greenstone interface it will display all, however it
> doesn't with the extracted metatdata from the source. I can send a
> snipit of the html code if that would help.
> Jenn Cole
> Katherine Don wrote:
> Hi Jenn
> Each classifier has a slightly different way of handling metadata - one
> of these days we will get around to standardising them. AZlist and
> AZSectionList only use the first value of the metadata. But the
> CompactLists can use all.
> Try the following to see if they do what you want:
> classify AZCompactList -metadata Subject -allvalues -doclevel section
> or
> classify AZCompactSectionList -metadata Subject -allvalues
> I'm not sure what the difference between AZCompactList -doclevel section
> and AZCompactSectionList is.
> This will items with common subjects into a subfolder. use -mingroup
> option to control this. Eg set mingroup to 1 to make the the first
> vertical list all subfolders, or if you don't want any subfolders, set
> it to something large.
> Cheers,
> Katherine Don
> Jenn Cole wrote:
>> Hello,
>> I am trying to get Greenstone to extract section subject metadata that
>> is coded in the source HTML document while using the AZSectionList
>> browsing classifier. I have several subject headings for each
>> section, however, when I build and preview the database only the last
>> subject term for each section appears. How do I tell Greenstone to
>> display all of the extracted subject terms?
>> Thanks,
>> Jenn Cole
>> UBCIC Library Technician
>> _______________________________________________
>> greenstone-users mailing list