|Ok, I'll try sending that again, in plain text this time. Is there a way
to prevent the list from scrubbing email sent in HTML format?
Stefan Boddie wrote:
You're right that assigning the same content as both metadata and text
is a little wasteful. It's sometimes the best way to do things in
Greenstone though, and in any case it doesn't have any significant
effect on the size or speed of the collection, unless you're creating
something huge (and even if you are, there are ways to reduce the
effects of content being duplicated as both metadata and text).
Having said that, you can probably do what you want without any
duplication of content in your GA files. One solution is to only assign
the body of your document as text, and assign everything else as
metadata. So your GA files would look something like the following.
<Metadata name="abstract">Here is my wonderful abstract.</Metadata>
this is the body of my text
And your collect.cfg would have an indexes line something like this.
indexes section:text section:abstract section:text,abstract
This would create three indexes, one for searching just the body text,
one for searching just the abstracts, and one for searching both together.
Greenstone Digital Library and Digitisation Specialists
Jonathan Gorman wrote:
>>What you need to do here is search within a particular
>>metadata element I believe.
>This was a solution I was thinking, but it's not a very
>satisfactory one. It means I have to duplicate content
>Lets use the article/abstract/body analogy.
>If I have it converted via plugin I could duplicate the
>abstract into a metadata element and into the contents, so the
>result GAF file would probably be something like
><Metadata name="abstract">Here is my wonderful
>Here is my wonderful abstract.
><Metadata name="body">this is the body of my text</Metadata>
>this is the body of my text
>It seems that this would be a fequent enough need that there
>would be a better way to do it. Ah well, back to trying to
>figure out the XMLPlug class. (Or figure out more some of the
>oddities that have happened when directly generating GAF files).
>The fact the mechanism is there to sort it out at the document
>level but not the section level just seems a little