[greenstone-users] Re: Fwd: Tests with Luigi

From Anupama of Greenstone Team
DateMon Jul 18 19:30:06 2011
Subject [greenstone-users] Re: Fwd: Tests with Luigi
In-Reply-To (4E1F984A-7010800-cs-waikato-ac-nz)
Hello,

We've looked at points 1 to 3b of your email. The responses follow
below, after each of your points.

Would like to test tomorrow's caveat to see if the changes that have now
been made work correctly for you? It will be provide some external
confirmation for us that the bugs have indeed been fixed. (Or perhaps
you'd prefer to wait with testing until all 9 points of your message
have been covered.)

Good luck,
Anupama


> Luigi and I worked today on testing OAI and PDF using the candidate 2.85
> Windows version given to Luigi par Sam
> (Greenstone-2.85-candidate-2011.06.08-windows.exe).
>
> For OAI, there is indeed the function of putting the document urls
> automatically in the OAI records and harvesting with the OAI verbs
> indeed works fine (independently of which server is hosting the
> collection). We do, however, find apparent problems:
>

> 1. If we build a collection in which some of the documents already have
> a url in the dc.Resource Identifier field and some do not, then then OAI
> server seems to provide two values for dc.Resource Identifier (the
> manually provided one and the internally generated) for those documents
> already having a specified url. You had said that if there is already a
> url, the internal one will not be generated. Is there a problem here?
>

There's two factors influencing the results here:

1. In order to provide a fixed URL, the user should set a
(Greenstone-specific) metadata field called "gs.OAI Resource URL", which
is also visible in GLI. Having set this, the Greenstone server will use
this manually assigned URL to override any that Greenstone generated
automatically, so that the custom URL rather than the automatically
generated one is what is displayed in the "Resource Identifier" field
when performing a GetRecord operation.

Therefore, what you said above is true, but it's true for the gs.OAI
Resource URL metadata field, not for the "dc.Resource Identifier"
metadata field.


2. The other part of your question that still remains is whether any
multiple metadata for dc.Resource Identifier should all be retained.

Just now, when trying out a few things, we found that the current
behaviour is as follows:
- if you had a manually assigned value in gs.OAI Resource URL, then any
automatically generated one is ignored
- if you have a manually assigned value in gs.OAI Resource URL AND you
have one or more manually assigned values in dc.Resource Identifier,
then ALL of them are visible.
- if you have one or more manually assigned values in the "dc.Resource
Identifier" field, but none in "gs.OAI Resource URL", then the
automatically generated Greenstone metadata is not discarded, since this
is a URL to the original document (or it is the URL to the
Greenstone-generated HTML version of the document, if there was no
original document to link to).

Instead of suppressing additional user-entered metadata relating to
dc.Identifier, the reason the second and third sort of behaviour may be
desirable is that a document may exist both at a URL *and*, for example,
additionally have an ISBN number which is also a resource identifier.
The former may be specifically assigned by the collection designer in
the gs.OAI Resource URL (or have been automatically generated correctly
by Greenstone itself, which the designer may wish to retain) while the
latter can be added by the designer in the dc.Resource Identifier field,
and Greenstone will preserve both of them.


In your other recent email on the matter of the OAI server, you wrote in
response to a message of mine:
"If the collection has document urls in the dc.Identifier field, then
the OAI server will carry these over into the OAI record. These will be
treated as absolute urls."

In light of the discoveries detailed above, it is the "gs.OAI Resource
URL" rather than the dc.Identifier field for which your statement holds
true. The "gs.OAI Resource URL" will be used for the "dc.Resource
Identifier" and treated as absolute URL.


> 2. We cannot succeed in downloading in GLI documents hosted on a
> Greenstone OAI server (using the candidate 2.85 version of Greenstone
> for GLI but testing a few varied internal and external servers). It
> seems that GLI is taking putting into ex.dc.Identifier of the new
> collection (supposed to be the url) from the OAI Identifier field of the
> remote OAI served collection instead of from Resource Identifier of this
> collection which contains the real url. Could you please advise?
>

You and Luigi have very helpfully identified a bug, which I think is
fixed now. (We tried out your test as described: a Greenstone OAI Server
on one machine, a GLI running on another machine, downloading documents
over OAI from that first server.) Fortunately, the error was not quite
so intelligent as you reasoned it to be: it was merely parsing the
document's OID wrong to thereby end up with the wrong collection-name
and, as a consequence, a wrong URL was built up for the Resource Identifier.

Here's the commit statement for the bug fix, in case it explains things
more:
"John Rose and Luigi had identified a bug in the Resource Identifier
URLs served upon a GetRecord action (the same metadata field served by a
ListRecords action was not faulty). The URL contained the incorrect name
for the collection. As it happens, some previous code committed in
commented out form with this file did have the necessary fix, however,
as it was embedded in a OAI ID test that need not get executed in this
stage of the code, the code crucially correcting the URL's collection
name was also commented out as a result. Reinstating only the necessary
line that adjusts the URL's collection name."

Thanks for discovering the bug and telling us about it.


> 3. Minor presentation problems:
> a. Why is the normal Identifier field in GLI called dc.Resource
> Identifier and not dc.Identifier?

"dc.Resource Identifier" is the official label in English for what is
dc.Identifier. This way, Dublin Core metadata has descriptive display
strings for the same piece of metadata in other languages as well, and
these will appear when you run GLI in another language setting.

On the page
http://dublincore.org/documents/usageguide/elements.shtml#identifier
see Section 4.14 and compare that with Subject in Section 4.2, whose
label is "Subject and Keywords" ("dc.Subject and Keywords" in GLI).

> b. In the Download view, the buttons on the bottom line (Clear Cache,
> Download, etc.) are eliminated from view on the right hand side if the
> window width is reduced. This means that some of the buttons may be out
> of sight. We suggest that the buttons be presented as in the Create view
> (where their width varies with window width to ensure that all of the
> buttons are always visible).

This has now been done.