[greenstone-users] Disk space

From Diego Spano
DateFri Oct 15 02:36:40 2010
Subject [greenstone-users] Disk space
In-Reply-To (201010140008-17272-wtmann-comune-belluno-it)

The build process cannot start from where it stops because the last run ends
abnormally. But, can you clarify to me what kind of collection are you
creating?. It is very strange that the build process takes 2 days... The
import process is more time consuming, but the build process is faster. What
kind of images are you managing?. Are there tiff files with ocr?.

I have a big collection withmore than 700.000 tiffs and each one has a text
file from ocr. Can?t remember how much time takes the import because I done
it in many steps, but the build process takes only a few hours (no more than
5 o 6 hours).

If you like, you can send me (off list) your collect.cfg and some sample
images and I will take a look to it.


-----Mensaje original-----
De: William T. Mann [mailto:wtmann@comune.belluno.it]
Enviado el: Mi?rcoles, 13 de Octubre de 2010 07:08 p.m.
Para: Diego Spano
CC: greenstone-users@list.scms.waikato.ac.nz
Asunto: Re: [greenstone-users] Disk space


Thanks for the quick reply! This is a big help to setting up my build

Just one more thing: now that I've gone through the import process and my
archives directory is populated (and I've freed up the necessary space), is
possible to start the build process where it left off? That is, can the
process be started with the creating of the indexes (the building folder)
without having to go through another 2 days of processing? I'm using the
PagedImage component (without the cache for space reasons) and as I stated
before there are over 40200 images!

Thanks again!

William Mann

On Wednesday 13 October 2010 17:39:05 Diego Spano wrote:
> William,
> the way Greenstone stores all objects is the following: If you import 27
> of jpg files, then you will have 27 GB in "import" folder, 27 GB in
> "archive" folder (after import process), and 27 GB in "building" folder
> (after buildcol process). The last step of creating a collection from GLI
> is to rename building folder as index, so now you don?t have building but
> you have index folder (with 27 GB).
> The problem you faced is because you create again the collection from GLI,
> so now you have 85 GB in import folder, 85 GB in archives folder (the
> previous content is deleted by default) and finally Greenstone needs
> another 85 GB for "building" folder. But you also have the old index
> folder (27 GB), so Greenstone is asking for 85 (building) plus 27 (index).
> You have 2 choices.
> Option 1- If you want to create collection from GLI, first delete index
> folder!
> Option 2- Run the process from command line (this is what I recommend).
> can achieve more control on what you need.
> The import folder can reside anywhere. You can even take the files from
> another computer. You can mount a Windows share on a lInux server. You can
> have two different disks, etc etc. You don?t have the need to mount the
> file system as import folder. You can have different filesystems too.
> Assume your collection name is "pictures" and that you have the files on a
> filesystem mounted on /my_docs.
> What you need is to redirect the import folder, assigning "importdir"
> option to the import.pl process. (this cannot be done through GLI).
> Perl -S import.pl -importdir /my_docs pictures
> The archive folder can be redirected too, to any other folder or shared
> resource. In this case, you need to specify the location to both process,
> the import and the build.
> Perl -S import.pl -importdir /my_docs -archivedir /archives pictures
> Perl -S buildcol.pl -archivedir /archives pictures
> The index folder must reside in the same location where you have GS
> installed. But there is another way to reduce space. You can notice that
> the contents of the archive folder are "almost" the same of "index/assoc"
> folder. So, if you are on Linux, you can make a link like index/assoc -->
> /archives and then you get more space. On Windows this is not possible, as
> far as I know....
> It is like a puzzle!.
> Hope this helps.
> Diego
> Diego Spano
> Prodigio Consultores
> Bernardo de Irigoyen N? 1114 2?B
> Capital Federal - Argentina
> Tel: (54 11) 5093-5313
> www.prodigioconsultores.com
> -----Mensaje original-----
> De: greenstone-users-bounces@list.scms.waikato.ac.nz
> [mailto:greenstone-users-bounces@list.scms.waikato.ac.nz] En nombre de
> William Mann
> Enviado el: Mi?rcoles, 13 de Octubre de 2010 06:08 a.m.
> Para: greenstone-users@list.scms.waikato.ac.nz
> Asunto: [greenstone-users] Disk space
> Hi,
> I've been using Greenstone with a very small (27 GB) test collection and
> now I have to build the final collection that consists of 40200+ images
> (about 85 GB). Since the first attempt to build ended with a disk space
> error, I then added a second disk to my computer and mounted the
> partitions as import and archives, also disabling the cache so the cached
> dir wouldn't fill up. This time all the files got processed but the build
> ended the same with a disk space error. After searching my drive I found
> that there was now a building and index directory: the first with 62GB of
> data and the second with 27GB of data. Since I've found that to process
> the files and start the building of the collection takes a little more
> than 2 days (at least using gli), I was wondering exactly what do I need
> to put on external disks to keep my drive from running out of space? Why
> is it making so many copies of my data in different places?
> --
> Rag. William Mann
> Comune di Belluno
> Servizio Sistemi Informativi
> Piazza Castello, 14
> 32100 Belluno
> Tel. 0437-913156
> e-mail: wtmann@comune.belluno.it

William T. Mann
Comune di Belluno