[greenstone-users] Re: OAI server update - Re: Fwd: Tests with Luigi

From Greenstone Team
DateTue Aug 23 14:26:18 2011
Subject [greenstone-users] Re: OAI server update - Re: Fwd: Tests with Luigi
In-Reply-To (4E5233FB-5020308-free-fr)
Hi,

> 3. Since the value of baseServerURL in oai.cfg (at least the port)
seem to be overridden by those in File/Settings, the guidance on this in
oai.cfg is confusing - the user should be advised NOT to set this value
except under conditions which should be clearly explained (is this
parameter needed at all?).

I've changed the order of the statements in the instructive comments to
the baseServerURL property, it now mentions at the very start that the
the URL will be generated automatically. Then it proceeds to explain the
circumstances under which the user may want to set this themselves
before telling them that a port number is compulsory, if other than 80.

The comment is now:
# The base URL of the web server. Will be automatically generated by the
# oaiserver program if not specified here, but may use 'xxx.xxx.xxx.xxx'
# IP address, not a nice 'www.mylibrary.com' human-readable domain name.
# If you edit this, it must include the port number if not using port 80.


> 2. Apparently, when you change the File/Settings and open Greenstone,
these settings are remembered when Greenstone is opened alone the next
time. However, GLI always opens with the same settings in File/Settings,
and if these are changed it does not affect the settings for opening
Greenstone alone (which are remembered from last time it was opened
alone without GLI). This is a bit confusing and I would suggest that the
settings be remembered irrespective of whether GLI was active or not
when they were set.

That's because there are 2 server configuration files/settings: one used
by GLI when it launches the server and one used when you launch the
server on its own. It wasn't my decision. This has always been the case
and is deliberate. I'd had long discussions about why the two should
have separate configurations at all, but there are reasons (which I
can't remember, but which seemed to have convinced me back then). Dr
Bainbridge will recall the reasons.


> Concerning OAI validation, it seeems to be just as I thought: Unless
you are in an institution like Waikato which has real IP numbers or
unless you pay for an IP number from your provider, your internal IP
numbers are normally unavailable from the outside,

The IP for localhost (127.0.0.1) is indeed always going to be
inaccessible from the outside, but if ticking "Get local IP" gives the
same correct and unique IP address of your machine as if you ran
"ipconfig /all" in a Windows DOS prompt, then setting up port forwarding
in conjunction with firewall settings on your machine may perhaps make
your served pages accessible from the outside. Lots of websites instruct
people on how to make their locally-hosted web server accessible to the
outside world, so this may be possible somehow unless your ISP is
specifically blocking it somehow. Sam here just confirmed this once more.


> 1. I can't see the advantage of having the Apache port changing
seemingly arbitrarily; it causes complications for using the local OAI
server and also for setting the library url for Preview. It would seem
better (unless this is likely to cause problems) to have the box in
File/Settings called "Do not modify port checked by default.

Dr Bainbridge wanted this new option in the Settings as a checkbox, but
did not want it as the default. He wanted to use the feature for
development and testing purposes, which is when we may be constantly
updating the Settings yet wish the port number to remain consistent or
warn us when that fails to succeed.

> P.S. One annoyance is that the files served by OAI server have a hash
code as a name rather than the original name. I guess that this can be
fixed by either the build parameters and/or the OAI mapping - I will
experiment but would appreciate a hint from your side.

I don't know how this may be done. I'll try to remember to ask Dr
Bainbridge about it. He may know.

In the meantime, try using GLI's Design panel to Configure the various
plugins you use in your plugin pipeline to set all their "OIDType"
properties to "dirname". Be aware that the message reads in order for
the dirname (preceded by "J") to be used, each directory must contain
only a single document, so as to make its OID unique. Then check to see
whether the GS OAI server honours the doc IDs thus generated. (I suspect
it will.)


Regards,
Anupama


John Rose wrote:
> OK Anupama, I have done a bit of testing with the Caveat-Emptor of
> 2011-08-20.
>
> I don't know what happens if your computer is assigned a real IP
> number (or if can be resolved to one by any process that I don't
> understand), but in the case ordinary users of commercial ISPs like me
> who are connected in a local network with internal IP numbers like
> 192.168.x.x, you need not use set either "Allow external connections"
> or "Get local IP" to access the oaiserver on the same machine through
> GLI.
>
> All that you have to do is one of the following:
> 1) Launch GLI
> 2) When the Greenstone 2 Digital Library box comes up, either:
> a) * Open File/Settings, note the values and Cancel;
> * In GLI open the new collection (or collection to be modified),
> and in Download/OAI set the url (using 127.0.1.1 if this had been set,
> or localhost if this had been set) and the port to the Greenstone
> settings, specify the set to collect from (which has to be listed in
> oai.cfg) and make sure Server Information works.
> OR
> b) * Change the Greenstone server Settings as you want (port, external
> access, url mode), say OK and again OK when it says the parameters
> will be used when the server is next launched (not true, see below),
> wait until the Settings box disappears (takes several seconds);
> * Open File/Settings again to see if the port has changed, note it
> and Cancel;
> * In GLI open the new collection (or collection to be modified),
> and in Download/OAI set the url (using 127.0.1.1 if this or resolve to
> IP number had been set, localhost if this had been set, and your
> computer name if resolve to name had been set) and the port to the
> Greenstone settings, specify the set to collect from (provided that it
> is listed in oai.cfg) and make sure Server Information works.
>
> One does NOT have to launch the Greenstone server since it is already
> configured when the Greenstone 2 Digital Library box comes up (either
> initially or after changing parameters). Thus the message "The
> settings will be available when you press Enter Library/Restart
> Library." should be corrected to "The settings are now available to
> GLI or will be used when you press Enter Library/Restart Library."
>
> Here are some more suggestions/comments:
> 1. I can't see the advantage of having the Apache port changing
> seemingly arbitrarily; it causes complications for using the local OAI
> server and also for setting the library url for Preview. It would seem
> better (unless this is likely to cause problems) to have the box in
> File/Settings called "Do not modify port checked by default.
> 2. Apparently, when you change the File/Settings and open Greenstone,
> these settings are remembered when Greenstone is opened alone the next
> time. However, GLI always opens with the same settings in
> File/Settings, and if these are changed it does not affect the
> settings for opening Greenstone alone (which are remembered from last
> time it was opened alone without GLI). This is a bit confusing and I
> would suggest that the settings be remembered irrespective of whether
> GLI was active or not when they were set.
> 3. Since the value of baseServerURL in oai.cfg (at least the port)
> seem to be overridden by those in File/Settings, the guidance on this
> in oai.cfg is confusing - the user should be advised NOT to set this
> value except under conditions which should be clearly explained (is
> this parameter needed at all?).
>
> Concerning OAI validation, it seeems to be just as I thought: Unless
> you are in an institution like Waikato which has real IP numbers or
> unless you pay for an IP number from your provider, your internal IP
> numbers are normally unavailable from the outside, and thus you cannot
> validate your local OAI server. For example, when I say "Get local IP"
> for the Greenstone server, it gives me 127.0.1.1 which is the same as
> local host; although this allows me to correctly get access in my
> internal network with GLI, it is not visible from the outside (error
> 500 for the validation request). On the other hand, if I use for the
> validation request the real IP number of my ISP (in this case
> 12.104.55.154), it also fails from the outside (error 500 for the
> validation request), since this number is only for outbound
> connections and cannot be used to reach me.
>
> What do you think of the above? Best regards, John
>
> P.S. One annoyance is that the files served by OAI server have a hash
> code as a name rather than the original name. I guess that this can be
> fixed by either the build parameters and/or the OAI mapping - I will
> experiment but would appreciate a hint from your side.
>
> On 19/08/2011 04:47, Greenstone Team wrote:
>> Hi John,
>>
>> > P.S. I really don't see how you can validate from a machine without
>> real IP address, but I will try.
>>
>> I described how to do this this in one of my recent messages, which is
>> likely to have got drowned in all the other stuff I wrote.
>>
>> To use your real IP address or hostname, you will need to go to File >
>> Settings of your Greenstone Server Interface application (the little
>> white Greenstone server dialog that you see when you run GLI or run
>> gs2-server.sh). There, in the section "Address Resolution Method" choose
>> one of first two options: "get local IP and resolve to a name" or "get
>> local IP". Don't forget to also tick the "Allow external connections"
>> option. Once you restart your GS server after that, it will use your
>> machine's host name or host IP.
>>
>> For a checklist of things you need to do to get the GS2 OAI server
>> working, refer to
>> http://wiki.greenstone.org/wiki/index.php/2.85_Release_Notes#Setting_up_your_Greenstone_OAI_Server_and_using_GLI_to_download_over_OAI_from_a_Greenstone_server
>>
>>
>>
>>
>> > I understand that harvesting from a collection on the same Ubuntu
>> machine should work, I will try.
>>
>> This need not be true. You need to make sure that any (non-Greenstone
>> specific) apache server running on such a machine is accessible to the
>> outside world, for which your machine's firewall and possibly your
>> router's port-forwarding should be set up correctly.
>>
>> If that is working, then your *Greenstone* apache server, including the
>> Greenstone oaiserver, should be accessible to the outside world as well.
>>
>> Also, since I've already successfully tested the GS OAI server against
>> the GLI OAI downloading feature on Linux, this may not be a particularly
>> crucial test that you need perform.
>>
>> > but don't let this slow the 2.85 release when will it be?
>>
>> The OAI server for 2.85 is fixed and completely validates again now, and
>> does not require further attention from me. But the GS3 OAI server
>> validation may fail again, now that there is the new validation test.
>>
>> I am awaiting Diego's response to a question on a unicode bug he
>> detected in a classifier which I couldn't reproduce when I tried it out
>> here. Now that the OAI stuff is done at last, I will move on to the
>> remaining items on my list: GS3 XSLT changes, additions and fixes and
>> how the default GS3 format statements should produce the look of what
>> Sam has generated for the GS3 interface. The GS3 installer also needs
>> some changes, which may take a while since I'm not familiar with the
>> installer code. Then comes the testing of GS2.85 binaries on all 3 OS:
>> (1) going over all the tutorials on all OS. From previous experience,
>> this takes about 3 days per OS.
>> (2) testing other aspects of 2.85 like the Remote GS and the various
>> indexer and database combinations on each OS.
>> Then step 2 needs to be repeated for all source distributions of GS2.85.
>> Then step 2 and additional GS3 tests need to be performed for GS3.
>>
>> Assuming I find no bugs, it will be off to generating the release
>> binaries and source distributions and uploading them to source forge,
>> updating the release notes and finally, informing the mailing list.
>>
>> One more thing. While adding in your request regarding the PDFPlugin
>> default metadata fields option, we discovered that the plugin currently
>> mysteriously skips extracting the Creator or Author field, one of the
>> two. Is this step very necessary though, now that the
>> EmbeddedMetadataPlugin is compulsory and since it retrieves all metadata
>> embedded in a PDF document anyway?
>>
>> Regards,
>> Anupama
>>
>>
>>
>> John Rose wrote:
>>> Dear Anupama,
>>>
>>> I have been busy, will go through all of your messages and try to
>>> test, but don't let this slow the 2.85 release (when will it be?). I
>>> understand that harvesting from a collection on the same Ubuntu
>>> machine should work, I will try. Thanks and best regards, John
>>>
>>> P.S. I really don't see how you can validate from a machine without
>>> real IP address, but I will try.
>>
>>
>>
>