Re: [greenstone-users] Extracted Greenstone Metadata from Adobe

From David Robley
DateThu, 20 Oct 2005 17:54:56 +0930
Subject Re: [greenstone-users] Extracted Greenstone Metadata from Adobe
In-Reply-To (4356DACC-3050409-cs-waikato-ac-nz)
On Thu, 20 Oct 2005 09:16, you wrote:
> Hi David
>
> > I was about to pose the same question; I can say that it works for me
> > now.
>
> great!
>
> > My PDF documents have comma separated lists of both author and
> > keyword and as a result all the keywords, or author listings for any
> > particular document are grouped together in the listing. Is it
> > possible to explode the comma separated list to provide separate
> > keyword and author listings?
>
> We don't do this at the moment. If you know Perl you could add it in.
> Otherwise we'll add it to our TODO list.
>
> > I guess the way author names are stored would need to be revised our
> > end, as currently we use eg "Alan Ralph, John Winston Toumbourou,
> > Morgen Grigg, Rhiannon Mulcahy, Michael Carr-Gregg and Matthew R.
> > Sanders". Presumably this would need to be like "Ralph Alan,
> > Toumbourou John Winston, ..." ??
>
> You should put the author names the way you want them to be displayed.
> The default sorting can handle both 'John Smith' and 'Smith, John'
> formats (as long as the documents are in english, or at least aren't
> recognised to be not english). The second format obviously wouldn't be
> good if you were exploding a comma separated list.
>
> > Another request that has been made to me is to be able to list all
> > the documents by the "issue" of the journal that they appear in. The
> > journal has issues like 'Vol 1 Issue 1' 'Vol 1 Issue 2' etc with a
> > number of articles in each; each article is a separate PDF doc. My
> > first thought would be to put the issue in the Title property of the
> > PDF doc, and the actual document title in the subject; as I
> > understand it I should then be able to use the ex.Title to group by
> > issue number, and still use ex.Subject to create a title group.
>
> Personally, I would put the Title of the PDF in Title, and Volume and
> Issue numbers in separate fields. This is easy to do if you are using
> GLI to add metadata - I guess you may be more restricted if you are
> using Adbobe fields?
> If you had say Title, Volume and Issue metadata, you could do a
> browsing hierarchy using GenericList, with -metadata Volume/Issue/Title
>
> Hope this helps,
> Katherine

Thanks Katherine, that is helpful. I decided to go with an external
metadata.xml file for all the metadata, as that gives a lot more
flexibility, together with the ability to use multiple like elements.
This way, if the client decides to say change the way volumes are
described, wants to add keywords or change author names (eg to add
Professor or Dr) it's a simple case of editing one text file, rather than
mucking with a potentially large number of files in Acrobat; which then
requires a Windows environment, where all the publishing is done in a
*nix environment.

Cheers
--
David Robley

I float like an anchor and sting like a moth.