[greenstone-users] Re: Problem of Slow collection loading in GLI

From Anupama of Greenstone Team
DateFri Feb 19 15:52:02 2010
Subject [greenstone-users] Re: Problem of Slow collection loading in GLI
In-Reply-To (4B726B85-6090003-iway-na)
Hi Renate,

Probably Katherine replied already. But

> What are the parameters for the command line to import and build
> incrementally?
I just did a "source setup.bash" in the Greenstone folder and ran
"import.pl" and then "buildcol.pl" to see the list of options they take.
(On Windows you can do the equivalent in a DOS prompt by first running
"setup.bat" in your Greenstone folder. Then type "perl -S import.pl" to
see its options, followed by "perl -S buildcol.pl" to view its option list.)

1. The part of the output of running import.pl that I think is relevant is:

Only import documents which are newer (by timestamp) than the current
archives files. Implies -keepold."

So "-incremental" is a new option flag you want to additionally pass to
import.pl when you run import from the command line. As it says, it will
-keepold as well. Incremental importing is not dependent on the indexer
you have chosen for building.

2. The relevant-looking part of the output of buildcol.pl's list of
options is similar to above, but for building to be incremental you are
restricted to using the Lucene indexer to building your collection:

Only index documents which have not been previously indexed. Implies
-keepold. Relies on the lucene indexer."

> Is this not possible for 2.80?
I believe incremental import and build were introduced from 2.82 onwards.


Renate Morgenstern wrote:
> Hi Katherine,
> What are the parameters for the command line to import and build
> incrementally? Is this not possible for 2.80?
> Thanks and regards
> Renate
>> Hi
>> Yes, GLI can be slow in loading very large collections.
>> I suggest you use command line building.
>> To use GLI to add documents and metadata, create a new empty dummy
>> collection. Add documents to this collection, but make sure they are
>> in a subfolder in the collection. Add metadata to the documents as usual.
>> Then, on the command line, move the folders from the dummy
>> collection's import folder into your main collections import folder.
>> They need to be in a subfolder so the metadata.xml file doesn't
>> overwrite any that is in there already.
>> Then, you can import and build the collection on the command line.
>> If you have a later version of greenstone (2.82 or 2.83 eg), you can
>> do incremental import of the new documents so you don't need to
>> reimport everything again. And if your collection uses Lucene (instead
>> of mg/mgpp) you can do incremental indexing too.
>> Regards,
>> Katherine
>> Cao Minh Kiem wrote:
>>> Dear GSDL users,
>>> We would like to ask for your help. We create a digital document
>>> collection using GLI. At this time, the size of collection is about
>>> 10 Gb. Every time when we open GLI to add new documents and metadata,
>>> it take us very long (more than 40 minutes) to load the collection.
>>> Is it the capability of GLI to load a collection? If not, could you
>>> help us in solving this issue?
>>> Best regards
>>> Cao Minh Kiem
>>> Deputy Director
>>> National Centre for S&T Information
>>> 24 Ly Thuong Kiet, Hanoi, VIETNAM
>>> Email: kiemcm@vista.gov.vn
>> _______________________________________________
>> greenstone-users mailing list
>> greenstone-users@list.scms.waikato.ac.nz
>> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
> --
> Renate Morgenstern
> P O Box 30664, WIndhoek, Namibia
> Tel/Fax: 242124
> Email: rmorgenstern@iway.na