[greenstone-users] RE: [greenstone-devel] distributed storage for objects?

From Mehrling, Martin
DateMon Aug 31 22:52:14 2009
Subject [greenstone-users] RE: [greenstone-devel] distributed storage for objects?
In-Reply-To (007a01ca1d21$bb2375b0$316a6110$-gov-ar)
Thanks Diego! I got it working, but the objects aren't pulled into the GLI so metadata can't be added that way. I will try the command line and also the web interface to see if that changes anything...

By the way, I upgraded to 2.82 and the hard_link error went away!

Martin

************************************
Martin Mehrling
Digital Systems Specialist
m.mehrling@neu.edu<mailto:m.mehrling@neu.edu>
617.373.5885
========================
301 Snell Library
Northeastern University
Boston, Massachusetts 02115

From: Diego Spano [mailto:dspano@orsna.gov.ar]
Sent: Friday, August 14, 2009 4:57 PM
To: Mehrling, Martin
Cc: greenstone-users@list.scms.waikato.ac.nz
Subject: RE: [greenstone-devel] distributed storage for objects?

Martin, one suggestion: try command line, not GLI!.

Diego

De: Mehrling, Martin [mailto:m.mehrling@neu.edu]
Enviado el: Viernes, 14 de Agosto de 2009 04:47 p.m.
Para: Diego Spano
CC: greenstone-users@list.scms.waikato.ac.nz
Asunto: RE: [greenstone-devel] distributed storage for objects?

Diego,

Thanks again for your help! I'm using version 2.81 so perhaps that's an issue. I also tried using the UNC path instead of the drive letter and that didn't work at all. The objects aren't being pulled into the GLI, so I think that must have something to do with it... I will keep trying things and will send the solution once I figure it out.

Martin

************************************
Martin Mehrling
Digital Systems Specialist
m.mehrling@neu.edu<mailto:m.mehrling@neu.edu>
617.373.5885
========================
301 Snell Library
Northeastern University
Boston, Massachusetts 02115

From: Diego Spano [mailto:dspano@orsna.gov.ar]
Sent: Friday, August 14, 2009 2:02 PM
To: Mehrling, Martin
Cc: greenstone-users@list.scms.waikato.ac.nz
Subject: RE: [greenstone-devel] distributed storage for objects?

Martin,

What GS versions have you?. I□m using GS2.81 on linux. So I wrote a little script to do the incremental import/build on a daily basis.

The script is the following:

**************************************************************
fecha=(`date +%Y%m%d%k%M`)
DIR="/gsdl/collect/auditoria/import/"
if [ "$(ls -A $DIR)" ]; then
cd /gsdl
. ./setup.bash
cd ./collect/auditoria
perl -S import.pl -out ./log/[$fecha]_import.log -keepold -OIDtype assigned -OIDmetadata orsna.Carpeta -archivedir /archives/auditoria auditoria
perl -S buildcol.pl -out ./log/[$fecha]_build.log -no_text -archivedir /archives/auditoria -builddir /gsdl/collect/auditoria/index -sections_index_document_metadata unless_section_metadata_exists auditoria
cd ./index
rm -rf ./assoc
ln -s /archives/auditoria ./assoc
rm -rf /gsdl/collect/auditoria/import/*
else
echo "Sin documentos"
fi

The script do the following tasks:


1- Check if there are any files to import in "/windows/auditoria/import/ (/windows is a Windows filesystem mounted on linux)

2- Run import process for collection "auditoria" taking the files from /windows/auditoria/import and saving archive files on /archives/auditoria (/archive is a file system mounted on a separate disk)

3- Run build process reading archive files /archives/auditoria and saving indexes in /gsdl/collect/auditoria/index (this needed fo incremental indexing)

4- Because I don□t want to have a copy of archives files in /index/assoc, I modified the basebuildproc.pm, just to avoid the copy process from archives to assoc. To get access to source documents I simply make a link from /index/assoc to /archives/auditoria and thats all.

I know that in Windows I had an error like you comment but I can□t remember on which GS version. In that moment the error was solved by Waikato and it should be ok in newer versions.

After your message, I made a fast test in windows with 2.82 and it seems to be working OK, but I did not dig too much....

Regards!

Diego


De: Mehrling, Martin [mailto:m.mehrling@neu.edu]
Enviado el: Viernes, 14 de Agosto de 2009 01:02 p.m.
Para: Diego Spano
CC: greenstone-users@list.scms.waikato.ac.nz
Asunto: RE: [greenstone-devel] distributed storage for objects?

Diego,

Very helpful! Thanks!

It works well on the same drive but when I try different drives (Windows Server 2003) it kind of works but not as well. I'm getting errors like this:

import.pl> util::hard_link: unable to create hard link. Attempting to copy file: C:Program FilesGreenstone2 mpF53.gif -> D:GstoneColls2archivesHASHfb50.dirmoonflag_thumb.gif

Should I not use drive letters? I also want to read/write to different machines. Do you know why this is raising an error? Could you let me know what values you are using for 'importdir' and 'archivedir'?

Thanks a lot,
Martin

************************************
Martin Mehrling
Digital Systems Specialist
m.mehrling@neu.edu<mailto:m.mehrling@neu.edu>
617.373.5885
========================
301 Snell Library
Northeastern University
Boston, Massachusetts 02115

From: Diego Spano [mailto:dspano@orsna.gov.ar]
Sent: Thursday, August 13, 2009 11:07 AM
To: Mehrling, Martin
Cc: greenstone-users@list.scms.waikato.ac.nz
Subject: RE: [greenstone-devel] distributed storage for objects?

Martin,

In GS2 you have 3 folders that requieres disk space: import, archive and index. The index folder MUST be inside the collection□s root folder. You can□t separate it.

But the other two yes. The import process has two important options:

-importdir: indicates from where to take the source documents. It can be another disk or another machine.
-archivedir: indicates the path to archive folder where to save the converted documents. It can be another disk or another machine.

The build process has the option -archivedir too, and shoul be set to the same path that you use in import process.

Let me comment you about a digital library I set here:

GS was on installed on a linux server and I mounted a Windows file system over a linux filesystem.

The server has 2 disks (not two partitions but two separate disks). When I run the import process the source documents are read from the windows file system mounted locally (so, no space wasted on linux disk) and the archives are generated on the second disk.

Then the buildprocess take documents from that disk and generates indexes on the maind hard disk. This way you vae separate the 3 parts of the process, and even more, having the archives and indexes in differents disks is a way to get a better performance.

Hope this helps you.

Diego Spano

De: greenstone-devel-bounces@list.scms.waikato.ac.nz [mailto:greenstone-devel-bounces@list.scms.waikato.ac.nz] En nombre de Mehrling, Martin
Enviado el: Jueves, 13 de Agosto de 2009 10:49 a.m.
Para: greenstone-users@list.scms.waikato.ac.nz; greenstone-devel@list.scms.waikato.ac.nz
Asunto: [greenstone-devel] distributed storage for objects?

Hi Folks,

We are running out of space on our Greenstone server and were wondering how other Greenstone sites deal with this problem. I know Greenstone3 has a more distributed model, but we aren't ready to move to that version just yet. Have any of you figured out a way to store objects in multiple locations?

Thanks a lot for any ideas!

Martin

************************************
Martin Mehrling
Digital Systems Specialist
m.mehrling@neu.edu<mailto:m.mehrling@neu.edu>
617.373.5885
========================
301 Snell Library
Northeastern University
Boston, Massachusetts 02115


-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/mailman/private/greenstone-users/attachments/20090818/211ff25f/attachment-0001.html