[greenstone-users] Re: OAI server update - Re: Fwd: Tests with Luigi

From John Rose
DateSat Aug 27 10:16:09 2011
Subject [greenstone-users] Re: OAI server update - Re: Fwd: Tests with Luigi
In-Reply-To (4E530FB3-9080007-cs-waikato-ac-nz)
Fine Anupama, but please see below:

On 23/08/2011 04:25, Greenstone Team wrote:
> Hi,
>
> > 3. Since the value of baseServerURL in oai.cfg (at least the port)
> seem to be overridden by those in File/Settings, the guidance on this in
> oai.cfg is confusing - the user should be advised NOT to set this value
> except under conditions which should be clearly explained (is this
> parameter needed at all?).
>
> I've changed the order of the statements in the instructive comments to
> the baseServerURL property, it now mentions at the very start that the
> the URL will be generated automatically. Then it proceeds to explain the
> circumstances under which the user may want to set this themselves
> before telling them that a port number is compulsory, if other than 80.
>
> The comment is now:
> # The base URL of the web server. Will be automatically generated by the
> # oaiserver program if not specified here, but may use 'xxx.xxx.xxx.xxx'
> # IP address, not a nice 'www.mylibrary.com' human-readable domain name.
> # If you edit this, it must include the port number if not using port 80.

I suggest you replace "not a nice 'www.mylibrary.com' human-readable
domain name." by "but NOT a human-readable domain name like
www.mylibrary.com'."
>
>
> > 2. Apparently, when you change the File/Settings and open Greenstone,
> these settings are remembered when Greenstone is opened alone the next
> time. However, GLI always opens with the same settings in File/Settings,
> and if these are changed it does not affect the settings for opening
> Greenstone alone (which are remembered from last time it was opened
> alone without GLI). This is a bit confusing and I would suggest that the
> settings be remembered irrespective of whether GLI was active or not
> when they were set.
>
> That's because there are 2 server configuration files/settings: one used
> by GLI when it launches the server and one used when you launch the
> server on its own. It wasn't my decision. This has always been the case
> and is deliberate. I'd had long discussions about why the two should
> have separate configurations at all, but there are reasons (which I
> can't remember, but which seemed to have convinced me back then). Dr
> Bainbridge will recall the reasons.

It would be interesting to learn the reasons, but if it will stay as is
there should be a better effort to document this behaviour, perhaps
through a warning message at the top of the files/settings box like:
"ATTENTION: if you have entered this box through GLI, the settings will
be set back to default when you leave GLI".
>
>
> > Concerning OAI validation, it seeems to be just as I thought: Unless
> you are in an institution like Waikato which has real IP numbers or
> unless you pay for an IP number from your provider, your internal IP
> numbers are normally unavailable from the outside,
>
> The IP for localhost (127.0.0.1) is indeed always going to be
> inaccessible from the outside, but if ticking "Get local IP" gives the
> same correct and unique IP address of your machine as if you ran
> "ipconfig /all" in a Windows DOS prompt, then setting up port forwarding
> in conjunction with firewall settings on your machine may perhaps make
> your served pages accessible from the outside. Lots of websites instruct
> people on how to make their locally-hosted web server accessible to the
> outside world, so this may be possible somehow unless your ISP is
> specifically blocking it somehow. Sam here just confirmed this once more.

My commercial ISP (and those that I know) provide space for an
accessible website on their server, but does not provide external access
to my computer through a real IP number.
>
>
> > 1. I can't see the advantage of having the Apache port changing
> seemingly arbitrarily; it causes complications for using the local OAI
> server and also for setting the library url for Preview. It would seem
> better (unless this is likely to cause problems) to have the box in
> File/Settings called "Do not modify port checked by default.
>
> Dr Bainbridge wanted this new option in the Settings as a checkbox, but
> did not want it as the default. He wanted to use the feature for
> development and testing purposes, which is when we may be constantly
> updating the Settings yet wish the port number to remain consistent or
> warn us when that fails to succeed.

Sorry, you have not replied as to why anyone would want the port to
change. It certainly complicates life - why can't we say no change as
the default?
>
> > P.S. One annoyance is that the files served by OAI server have a hash
> code as a name rather than the original name. I guess that this can be
> fixed by either the build parameters and/or the OAI mapping - I will
> experiment but would appreciate a hint from your side.
>
> I don't know how this may be done. I'll try to remember to ask Dr
> Bainbridge about it. He may know.

OK, waiting.
>
> In the meantime, try using GLI's Design panel to Configure the various
> plugins you use in your plugin pipeline to set all their "OIDType"
> properties to "dirname". Be aware that the message reads in order for
> the dirname (preceded by "J") to be used, each directory must contain
> only a single document, so as to make its OID unique. Then check to see
> whether the GS OAI server honours the doc IDs thus generated. (I suspect
> it will.)

I have my files in the import folder without directories for the files.
Your method may well work fine, but many users may find it complicated
and tedious? Thanks and best regards, John
>
>
> Regards,
> Anupama
>
>
> John Rose wrote:
>> OK Anupama, I have done a bit of testing with the Caveat-Emptor of
>> 2011-08-20.
>>
>> I don't know what happens if your computer is assigned a real IP
>> number (or if can be resolved to one by any process that I don't
>> understand), but in the case ordinary users of commercial ISPs like me
>> who are connected in a local network with internal IP numbers like
>> 192.168.x.x, you need not use set either "Allow external connections"
>> or "Get local IP" to access the oaiserver on the same machine through
>> GLI.
>>
>> All that you have to do is one of the following:
>> 1) Launch GLI
>> 2) When the Greenstone 2 Digital Library box comes up, either:
>> a) * Open File/Settings, note the values and Cancel;
>> * In GLI open the new collection (or collection to be modified), and
>> in Download/OAI set the url (using 127.0.1.1 if this had been set, or
>> localhost if this had been set) and the port to the Greenstone
>> settings, specify the set to collect from (which has to be listed in
>> oai.cfg) and make sure Server Information works.
>> OR
>> b) * Change the Greenstone server Settings as you want (port, external
>> access, url mode), say OK and again OK when it says the parameters
>> will be used when the server is next launched (not true, see below),
>> wait until the Settings box disappears (takes several seconds);
>> * Open File/Settings again to see if the port has changed, note it and
>> Cancel;
>> * In GLI open the new collection (or collection to be modified), and
>> in Download/OAI set the url (using 127.0.1.1 if this or resolve to IP
>> number had been set, localhost if this had been set, and your computer
>> name if resolve to name had been set) and the port to the Greenstone
>> settings, specify the set to collect from (provided that it is listed
>> in oai.cfg) and make sure Server Information works.
>>
>> One does NOT have to launch the Greenstone server since it is already
>> configured when the Greenstone 2 Digital Library box comes up (either
>> initially or after changing parameters). Thus the message "The
>> settings will be available when you press Enter Library/Restart
>> Library." should be corrected to "The settings are now available to
>> GLI or will be used when you press Enter Library/Restart Library."
>>
>> Here are some more suggestions/comments:
>> 1. I can't see the advantage of having the Apache port changing
>> seemingly arbitrarily; it causes complications for using the local OAI
>> server and also for setting the library url for Preview. It would seem
>> better (unless this is likely to cause problems) to have the box in
>> File/Settings called "Do not modify port checked by default.
>> 2. Apparently, when you change the File/Settings and open Greenstone,
>> these settings are remembered when Greenstone is opened alone the next
>> time. However, GLI always opens with the same settings in
>> File/Settings, and if these are changed it does not affect the
>> settings for opening Greenstone alone (which are remembered from last
>> time it was opened alone without GLI). This is a bit confusing and I
>> would suggest that the settings be remembered irrespective of whether
>> GLI was active or not when they were set.
>> 3. Since the value of baseServerURL in oai.cfg (at least the port)
>> seem to be overridden by those in File/Settings, the guidance on this
>> in oai.cfg is confusing - the user should be advised NOT to set this
>> value except under conditions which should be clearly explained (is
>> this parameter needed at all?).
>>
>> Concerning OAI validation, it seeems to be just as I thought: Unless
>> you are in an institution like Waikato which has real IP numbers or
>> unless you pay for an IP number from your provider, your internal IP
>> numbers are normally unavailable from the outside, and thus you cannot
>> validate your local OAI server. For example, when I say "Get local IP"
>> for the Greenstone server, it gives me 127.0.1.1 which is the same as
>> local host; although this allows me to correctly get access in my
>> internal network with GLI, it is not visible from the outside (error
>> 500 for the validation request). On the other hand, if I use for the
>> validation request the real IP number of my ISP (in this case
>> 12.104.55.154), it also fails from the outside (error 500 for the
>> validation request), since this number is only for outbound
>> connections and cannot be used to reach me.
>>
>> What do you think of the above? Best regards, John
>>
>> P.S. One annoyance is that the files served by OAI server have a hash
>> code as a name rather than the original name. I guess that this can be
>> fixed by either the build parameters and/or the OAI mapping - I will
>> experiment but would appreciate a hint from your side.
>>
>> On 19/08/2011 04:47, Greenstone Team wrote:
>>> Hi John,
>>>
>>> > P.S. I really don't see how you can validate from a machine without
>>> real IP address, but I will try.
>>>
>>> I described how to do this this in one of my recent messages, which is
>>> likely to have got drowned in all the other stuff I wrote.
>>>
>>> To use your real IP address or hostname, you will need to go to File >
>>> Settings of your Greenstone Server Interface application (the little
>>> white Greenstone server dialog that you see when you run GLI or run
>>> gs2-server.sh). There, in the section "Address Resolution Method" choose
>>> one of first two options: "get local IP and resolve to a name" or "get
>>> local IP". Don't forget to also tick the "Allow external connections"
>>> option. Once you restart your GS server after that, it will use your
>>> machine's host name or host IP.
>>>
>>> For a checklist of things you need to do to get the GS2 OAI server
>>> working, refer to
>>> http://wiki.greenstone.org/wiki/index.php/2.85_Release_Notes#Setting_up_your_Greenstone_OAI_Server_and_using_GLI_to_download_over_OAI_from_a_Greenstone_server
>>>
>>>
>>>
>>>
>>> > I understand that harvesting from a collection on the same Ubuntu
>>> machine should work, I will try.
>>>
>>> This need not be true. You need to make sure that any (non-Greenstone
>>> specific) apache server running on such a machine is accessible to the
>>> outside world, for which your machine's firewall and possibly your
>>> router's port-forwarding should be set up correctly.
>>>
>>> If that is working, then your *Greenstone* apache server, including the
>>> Greenstone oaiserver, should be accessible to the outside world as well.
>>>
>>> Also, since I've already successfully tested the GS OAI server against
>>> the GLI OAI downloading feature on Linux, this may not be a particularly
>>> crucial test that you need perform.
>>>
>>> > but don't let this slow the 2.85 release when will it be?
>>>
>>> The OAI server for 2.85 is fixed and completely validates again now, and
>>> does not require further attention from me. But the GS3 OAI server
>>> validation may fail again, now that there is the new validation test.
>>>
>>> I am awaiting Diego's response to a question on a unicode bug he
>>> detected in a classifier which I couldn't reproduce when I tried it out
>>> here. Now that the OAI stuff is done at last, I will move on to the
>>> remaining items on my list: GS3 XSLT changes, additions and fixes and
>>> how the default GS3 format statements should produce the look of what
>>> Sam has generated for the GS3 interface. The GS3 installer also needs
>>> some changes, which may take a while since I'm not familiar with the
>>> installer code. Then comes the testing of GS2.85 binaries on all 3 OS:
>>> (1) going over all the tutorials on all OS. From previous experience,
>>> this takes about 3 days per OS.
>>> (2) testing other aspects of 2.85 like the Remote GS and the various
>>> indexer and database combinations on each OS.
>>> Then step 2 needs to be repeated for all source distributions of GS2.85.
>>> Then step 2 and additional GS3 tests need to be performed for GS3.
>>>
>>> Assuming I find no bugs, it will be off to generating the release
>>> binaries and source distributions and uploading them to source forge,
>>> updating the release notes and finally, informing the mailing list.
>>>
>>> One more thing. While adding in your request regarding the PDFPlugin
>>> default metadata fields option, we discovered that the plugin currently
>>> mysteriously skips extracting the Creator or Author field, one of the
>>> two. Is this step very necessary though, now that the
>>> EmbeddedMetadataPlugin is compulsory and since it retrieves all metadata
>>> embedded in a PDF document anyway?
>>>
>>> Regards,
>>> Anupama
>>>
>>>
>>>
>>> John Rose wrote:
>>>> Dear Anupama,
>>>>
>>>> I have been busy, will go through all of your messages and try to
>>>> test, but don't let this slow the 2.85 release (when will it be?). I
>>>> understand that harvesting from a collection on the same Ubuntu
>>>> machine should work, I will try. Thanks and best regards, John
>>>>
>>>> P.S. I really don't see how you can validate from a machine without
>>>> real IP address, but I will try.
>>>
>>>
>>>
>>
>
>
>

--
************

John B. Rose
1 Bis rue des Ch□tre-Sacs
92310 S□vres, France

Email: john.rose1@free.fr
Alternate email: johnrose@alumni.caltech.edu