[greenstone-users] Collection size anomaly

From Diego Spano
DateThu Aug 26 04:07:25 2010
Subject [greenstone-users] Collection size anomaly
In-Reply-To (20100825154818-CC8C3AAD614-fx405-security-mail-net)
Pier, you are right. You only need "index" folder to serve the collection. I
was talking about the worst scenario, where you have 3 copies of each
object. But this can be changed in many ways:

1- The import folder can reside anywhere. You can take the files from
another computer. You can mount a Windows share on a lInux server. You can
have two different disk, etc etc. What you need is to redirect the import
folder, assigning "importdir" option to the import.pl process. (this can?t
be done through GLI). Some examples:

Perl -S import.pl -importdir e:docsmy_pdf
Perl -S import.pl -importdir /windows_mounted_share/docs

2- The archive folder can be redirected too, to any other folder or shared
resource. In this case, you need to specify the location to both process,
the import and the build.

Perl -S import.pl -importdir e:docsmy_pdf -archivedir e:archives
Perl -S buildcol.pl -archivedir e:archives

3- The index must reside in the same location where you have GS installed.
But there is another way to reduce space. You can notice that the contents
of the archive folder are "almost" the same of "index/assoc" folder. So, if
you are on Linux, you can make a link like index/assoc --> /archives and
then you get more space. On Windows this is not possible, as far as I

It is like a puzzle!.

Hope this helps.


-----Mensaje original-----
De: Pier Luigi ROSSI [mailto:rossi@ird.fr]
Enviado el: Mi?rcoles, 25 de Agosto de 2010 12:48 p.m.
Para: Jay Clark
CC: Diego Spano; greenstone-users@list.scms.waikato.ac.nz
Asunto: Re: [greenstone-users] Collection size anomaly

Hi Jay and Diego,

a Greenstone collection contain this 3 parts (import, archive and index)
and take place on the hard disk.
in the tmp folder you have also the text extracted from the files (I
work in general with pdf files).
But, in general, you produces a collection in "a" computer and then you
put this collection
on the server. In that case, you dont have to put all the collection folder
but a part of the folders : when you copy the collection on the server
you dont need
tmp, import and archive !!!
(I work in that way for our serveur .... and it is ok).

Sorry for my english !!


Le 25/08/2010 17:32, Jay Clark a ?crit :
> Hi Diego,
> So, if I understand you correctly, Greenstone stores 3 separate copies of
each digital resource in its original form and another in plain text? Are my
collections 3 times larger than they need to be?
> I see this is a huge issue for hard drive space requirements when creating
a library - to have 3 copies of a resource in the library, plus the original
if it was on your hard drive in the first place. Is this issue going to be
addressed in subsequent versions of Greenstone? Is there a way to safely
pare this down to just one copy of the resource in the library (not
including plain text version)?
> Thank you,
> Jay Clark
> MAF Learning Technologies
> www.maflt.org
> www.maf.org
> -----Original Message-----
> From: greenstone-users-bounces@list.scms.waikato.ac.nz
[mailto:greenstone-users-bounces@list.scms.waikato.ac.nz] On Behalf Of Diego
> Sent: Wednesday, August 25, 2010 7:48 AM
> To: steve@arlis.org; greenstone-users@list.scms.waikato.ac.nz
> Subject: RE: [greenstone-users] Collection size anomaly
> Steve, the way Greenstone stores all objects and indexes is the same as
other applications do. If you import a 2 MB PDF, then you will have 2 MB in
import folder, 2 MB in archive (after import process), 2 MB in indexassoc
(after buildcol process), and a few more KB (the full text index).
> Greenstone relays on operating system. Sometimes Windows is not good
retrieving file sizes and free space (in other features too!!!). Perhaps you
can install a third party software to get the correct values for disk space.
> Try this: http://www.glenn.delahoy.com/software/files/DiskAnalyser201.zip
> It is very easy to install.
> Hope this helps.
> Diego Spano
> Prodigio Consultores
> Bernardo de Irigoyen N? 1114 2?B
> Capital Federal - Argentina
> Tel: (54 11) 5093-5313
> www.prodigioconsultores.com
> -----Mensaje original-----
> De: greenstone-users-bounces@list.scms.waikato.ac.nz
> [mailto:greenstone-users-bounces@list.scms.waikato.ac.nz] En nombre de
Steve Johnson Enviado el: Martes, 24 de Agosto de 2010 08:45 p.m.
> Para: greenstone-users@list.scms.waikato.ac.nz
> Asunto: [greenstone-users] Collection size anomaly
> Should one expect anomalies when checking the total size of a Greenstone
collection, perhaps particularly in a Windows environment? I have a
Greenstone 2.82 collection which, according to Windows file manager, is
larger than the total space-in-use on the drive on which Greenstone resides.
> I built the collection on a Windows XP desktop machine. The size of the
collection displays as 145 gig (on the property sheet for the named
collection folder, inside the collect folder.) The computer contains one
160 gig disk, which shows 90 gig as being free and 70 gig in use.
> The 4982 import files. in 780 folders, in this Greenstone collection
occupy 46 gig. I am getting ready to move this collection to a Linux
development server for testing prior to deployment on a live Linux server.
> An accurate take on collection size is obviously important. Am I missing
something about the way Greenstone organizes files on Windows systems?
> I would appreciate any pointers or comments, on or off list. I did not
find this subject in the mailing list archives.
> Steve
> --
> Steve Johnson
> Systems Coordinator/Management Team
> Alaska Resources Library& Information Services (ARLIS) steve@arlis.org
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users

Pier Luigi ROSSI
32, avenue Henri Varagnat
93140 Bondy

Tel : 33 (0)1 48 02 56 96
Fax : 33 (0)1 48 47 30 88