[greenstone-users] RE: [greenstone-devel] distributed storage for objects?

From Diego Spano
DateSat Aug 15 05:59:13 2009
Subject [greenstone-users] RE: [greenstone-devel] distributed storage for objects?
In-Reply-To (6291FA526EA64B42B54BEF1A889C184E4005CFDF-NEUBOS3ES816CLS-nunet-neu-edu)

What GS versions have you?. I□m using GS2.81 on linux. So I wrote a little
script to do the incremental import/build on a daily basis.

The script is the following:


fecha=(`date +%Y%m%d%k%M`)


if [ "$(ls -A $DIR)" ]; then

cd /gsdl

. ./setup.bash

cd ./collect/auditoria

perl -S import.pl -out ./log/[$fecha]_import.log -keepold
-OIDtype assigned -OIDmetadata orsna.Carpeta -archivedir /archives/auditoria

perl -S buildcol.pl -out ./log/[$fecha]_build.log -no_text
-archivedir /archives/auditoria -builddir /gsdl/collect/auditoria/index
-sections_index_document_metadata unless_section_metadata_exists auditoria

cd ./index

rm -rf ./assoc

ln -s /archives/auditoria ./assoc

rm □rf /gsdl/collect/auditoria/import/*


echo "Sin documentos"


The script do the following tasks:

1- Check if there are any files to import in
"/windows/auditoria/import/ (/windows is a Windows filesystem mounted on

2- Run import process for collection □auditoria□ taking the files from
/windows/auditoria/import and saving archive files on /archives/auditoria
(/archive is a file system mounted on a separate disk)

3- Run build process reading archive files /archives/auditoria and
saving indexes in /gsdl/collect/auditoria/index (this needed fo incremental

4- Because I don□t want to have a copy of archives files in
/index/assoc, I modified the basebuildproc.pm, just to avoid the copy
process from archives to assoc. To get access to source documents I simply
make a link from /index/assoc to /archives/auditoria and thats all.

I know that in Windows I had an error like you comment but I can□t remember
on which GS version. In that moment the error was solved by Waikato and it
should be ok in newer versions.

After your message, I made a fast test in windows with 2.82 and it seems to
be working OK, but I did not dig too much□.



De: Mehrling, Martin [mailto:m.mehrling@neu.edu]
Enviado el: Viernes, 14 de Agosto de 2009 01:02 p.m.
Para: Diego Spano
CC: greenstone-users@list.scms.waikato.ac.nz
Asunto: RE: [greenstone-devel] distributed storage for objects?


Very helpful! Thanks!

It works well on the same drive but when I try different drives (Windows
Server 2003) it kind of works but not as well. I□m getting errors like

import.pl> util::hard_link: unable to create hard link. Attempting to copy
file: C:Program FilesGreenstone2 mpF53.gif ->

Should I not use drive letters? I also want to read/write to different
machines. Do you know why this is raising an error? Could you let me know
what values you are using for □importdir□ and □archivedir□?

Thanks a lot,



Martin Mehrling

Digital Systems Specialist




301 Snell Library

Northeastern University

Boston, Massachusetts 02115

From: Diego Spano [mailto:dspano@orsna.gov.ar]
Sent: Thursday, August 13, 2009 11:07 AM
To: Mehrling, Martin
Cc: greenstone-users@list.scms.waikato.ac.nz
Subject: RE: [greenstone-devel] distributed storage for objects?


In GS2 you have 3 folders that requieres disk space: import, archive and
index. The index folder MUST be inside the collection□s root folder. You
can□t separate it.

But the other two yes. The import process has two important options:

-importdir: indicates from where to take the source documents. It can be
another disk or another machine.

-archivedir: indicates the path to archive folder where to save the
converted documents. It can be another disk or another machine.

The build process has the option □archivedir too, and shoul be set to the
same path that you use in import process.

Let me comment you about a digital library I set here:

GS was on installed on a linux server and I mounted a Windows file system
over a linux filesystem.

The server has 2 disks (not two partitions but two separate disks). When I
run the import process the source documents are read from the windows file
system mounted locally (so, no space wasted on linux disk) and the archives
are generated on the second disk.

Then the buildprocess take documents from that disk and generates indexes on
the maind hard disk. This way you vae separate the 3 parts of the process,
and even more, having the archives and indexes in differents disks is a way
to get a better performance.

Hope this helps you.

Diego Spano

De: greenstone-devel-bounces@list.scms.waikato.ac.nz
[mailto:greenstone-devel-bounces@list.scms.waikato.ac.nz] En nombre de
Mehrling, Martin
Enviado el: Jueves, 13 de Agosto de 2009 10:49 a.m.
Para: greenstone-users@list.scms.waikato.ac.nz;
Asunto: [greenstone-devel] distributed storage for objects?

Hi Folks,

We are running out of space on our Greenstone server and were wondering how
other Greenstone sites deal with this problem. I know Greenstone3 has a
more distributed model, but we aren□t ready to move to that version just
yet. Have any of you figured out a way to store objects in multiple

Thanks a lot for any ideas!



Martin Mehrling

Digital Systems Specialist




301 Snell Library

Northeastern University

Boston, Massachusetts 02115

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/mailman/private/greenstone-users/attachments/20090814/a452303e/attachment-0001.html