[greenstone-users] Disk space

From Diego Spano
DateThu Oct 14 04:34:44 2010
Subject [greenstone-users] Disk space
In-Reply-To (4CB57703-9040409-comune-belluno-it)

the way Greenstone stores all objects is the following: If you import 27 GB
of jpg files, then you will have 27 GB in "import" folder, 27 GB in
"archive" folder (after import process), and 27 GB in "building" folder
(after buildcol process). The last step of creating a collection from GLI is
to rename building folder as index, so now you don□t have building but you
have index folder (with 27 GB).

The problem you faced is because you create again the collection from GLI,
so now you have 85 GB in import folder, 85 GB in archives folder (the
previous content is deleted by default) and finally Greenstone needs another
85 GB for "building" folder. But you also have the old index folder (27 GB),
so Greenstone is asking for 85 (building) plus 27 (index).

You have 2 choices.

Option 1- If you want to create collection from GLI, first delete index

Option 2- Run the process from command line (this is what I recommend). You
can achieve more control on what you need.
The import folder can reside anywhere. You can even take the files from
another computer. You can mount a Windows share on a lInux server. You can
have two different disks, etc etc. You don□t have the need to mount the file
system as import folder. You can have different filesystems too. Assume your
collection name is "pictures" and that you have the files on a filesystem
mounted on /my_docs.

What you need is to redirect the import folder, assigning "importdir" option
to the import.pl process. (this cannot be done through GLI).

Perl -S import.pl -importdir /my_docs pictures

The archive folder can be redirected too, to any other folder or shared
resource. In this case, you need to specify the location to both process,
the import and the build.

Perl -S import.pl -importdir /my_docs -archivedir /archives pictures
Perl -S buildcol.pl -archivedir /archives pictures

The index folder must reside in the same location where you have GS
installed. But there is another way to reduce space. You can notice that the
contents of the archive folder are "almost" the same of "index/assoc"
folder. So, if you are on Linux, you can make a link like index/assoc -->
/archives and then you get more space. On Windows this is not possible, as
far as I know....

It is like a puzzle!.

Hope this helps.


Diego Spano
Prodigio Consultores
Bernardo de Irigoyen N□ 1114 2□B
Capital Federal - Argentina
Tel: (54 11) 5093-5313

-----Mensaje original-----
De: greenstone-users-bounces@list.scms.waikato.ac.nz
[mailto:greenstone-users-bounces@list.scms.waikato.ac.nz] En nombre de
William Mann
Enviado el: Mi□rcoles, 13 de Octubre de 2010 06:08 a.m.
Para: greenstone-users@list.scms.waikato.ac.nz
Asunto: [greenstone-users] Disk space


I've been using Greenstone with a very small (27 GB) test collection and now
I have to build the final collection that consists of 40200+ images (about
85 GB). Since the first attempt to build ended with a disk space error, I
then added a second disk to my computer and mounted the partitions as import
and archives, also disabling the cache so the cached dir wouldn't fill up.
This time all the files got processed but the build ended the same with a
disk space error. After searching my drive I found that there was now a
building and index directory: the first with 62GB of data and the second
with 27GB of data. Since I've found that to process the files and start the
building of the collection takes a little more than 2 days (at least using
gli), I was wondering exactly what do I need to put on external disks to
keep my drive from running out of space? Why is it making so many copies of
my data in different places?

Rag. William Mann
Comune di Belluno
Servizio Sistemi Informativi
Piazza Castello, 14
32100 Belluno
Tel. 0437-913156
e-mail: wtmann@comune.belluno.it