[greenstone-users] Version 2.72 and CDS/ISIS

From John Rose
DateThu, 14 Dec 2006 23:11:19 +0100
Subject [greenstone-users] Version 2.72 and CDS/ISIS
Dear Greenstone users/developers,

I have been working with the Greenstone team to
ensure liaison with CDS/ISIS users, and am taking
this opportunity to list (in more detail than in
the release announcements) the improvements for
CDS/ISIS database conversion in Greenstone
version 2.72 relative to version 2.70 (most of
these functions were available in 2.71 but some
had bugs or have been further improved, so
CDS/ISIS users wishing to benefit from them are advised to upgrade to 2.72):

1. The ^* metadata element is available to access
the first subfield of a field with subfields
(even if it is the main field without a delimiting prefix).

2. Backslashes in a CDS/ISIS field (e.g. Windows
file paths) will display correctly with Greenstone formatting language.

3. Support for DOS 852 coding (needed for
DOS-based CDS/ISIS databases in Eastern European languages).

4. Logically deleted records will not be imported
(with prior versions, you had to export to an ISO
file and re-import into CDS/ISIS before converting to Greenstone)

5. A "-records_per_folder" option has been added
to the explode function. This puts the records
from exploding a metadata database into multiple
subdirectories, which means that the GLI should
use less memory and edit the metadata more
quickly. This option has not yet been tested for
its usefulness in real conversion situations; it
may be tried for large databases in which the
time for explosion seems inordinately long. The
default value is 100, so you can try a lower value, say 10.

6. A bug under Linux, by which the CDS/ISIS files
with filenames in capital letters were not
handled correctly, has been fixed (previously the
filenames had to be changed manually to small
letters before dragging them into GLI).

7. '&' characters and spaces in filenames now
work in the "document_field" parameter of the
explode function (previously, the corresponding documents were not imported).

8. When the "document_field" CDS/ISIS field is
repeatable, each occurrence will yield a separate
Greenstone document, each with the same metadata
(previously only the first occurrence was imported).

9. Building a CDS/ISIS collection (either "as
is", i.e. metadata only, or by exploding) should
be significantly faster in Greenstone v2.71, as
it no longer tries to determine the encoding of the CDS/ISIS file.

All reported problems with the "as is" conversion
of large CDS/ISIS databases with GLI seem to have
been resolved with v2.72 - one user has
successfully converted a database of 38,000
records. On the other hand, GLI may fail at the
explode step because it wasn't designed to handle
huge amounts of metadata (typically when
approaching 15,000 CDS/ISIS records, but possibly
less or greater depending on the size of the
records); in this case, the command line may be
used, and I will shortly be posting to the Wiki a
summary of this process for basic Greenstone
users. Please do report to the discussion lists
any problems encountered in CDS/ISIS conversions.

With best regards, John Rose

John B. Rose
Honorary Research Associate, University of Waikato
Sèvres, France
Email: <johnrose@alumni.caltech.edu>