[greenstone-users] Disk space

From Diego Spano
DateFri Oct 15 03:47:18 2010
Subject [greenstone-users] Disk space
In-Reply-To (4CB70FC9-6040004-comune-belluno-it)
William,

Don□t use metadata names with spaces. You have <Subject and Keywords>,
replace it with <Subject_and_Keywords> or simply <Subject>.

I think that your problem is with a hierarchy classifier. Am I right?. This
classifier uses "/" and "|" as level separators. So, with the Subjet
metadata you have, Greenstone will create a hierarchy like this:

Narrativa e diari di viaggio
----> La giovent□ perduta - Rifacimento definitivo del testo (C/
--------> cassetto)

What you have to do is to configure the hierarchy classifier to accept only
"|". You can do this with the following parameter:

classifyHierarchy -metadata Subject -separator [|]

And this will create a hierarchy like this:

Narrativa e diari di viaggio
----> La giovent□ perduta - Rifacimento definitivo del testo (C/cassetto)


Test it please.

Diego

-----Mensaje original-----
De: William Mann [mailto:wtmann@comune.belluno.it]
Enviado el: Jueves, 14 de Octubre de 2010 11:12 a.m.
Para: Diego Spano
CC: greenstone-users@list.scms.waikato.ac.nz
Asunto: Re: [greenstone-users] Disk space

Diego,

The process that's taking 2 days (circa) is the import process. I ran
the buildcol.pl script this morning and it took less than an hour to
complete. For the import I have 40200+ jpegs with no text. I've created
a series of .item files for the PagedImagePlugin which contains two
lines of metadata used to organize the images. The PagedImagePlugin does
create a smaller image for the web pages though. Other than that, I'm
not doing anything complicated.

May I ask you another question? In my .item files that are being
processed, the metadata is indicated as so:

<Title>La giovent□ perduta - Rifacimento definitivo del testo (C/cassetto)
<Subject and Keywords>Narrativa e diari di viaggio|La giovent□ perduta -
Rifacimento definitivo del testo (C/cassetto)

My problem is that where there is something like '(C/cassetto)' the text
is being truncated and becomes '(C'. Is there an escape sequence to
allow the use of special characters (and accented ones also)?

Thanks for the time you've dedicated me, I am very grateful.

--
Rag. William Mann
Comune di Belluno
Servizio Sistemi Informativi
Piazza Castello, 14
32100 Belluno
Tel. 0437-913156
e-mail: wtmann@comune.belluno.it


Il 14/10/2010 15:41, Diego Spano ha scritto:
> William,
>
> The build process cannot start from where it stops because the last run
ends
> abnormally. But, can you clarify to me what kind of collection are you
> creating?. It is very strange that the build process takes 2 days... The
> import process is more time consuming, but the build process is faster.
What
> kind of images are you managing?. Are there tiff files with ocr?.
>
> I have a big collection withmore than 700.000 tiffs and each one has a
text
> file from ocr. Can□t remember how much time takes the import because I
done
> it in many steps, but the build process takes only a few hours (no more
than
> 5 o 6 hours).
>
> If you like, you can send me (off list) your collect.cfg and some sample
> images and I will take a look to it.
>
> Diego
>
> -----Mensaje original-----
> De: William T. Mann [mailto:wtmann@comune.belluno.it]
> Enviado el: Mi□rcoles, 13 de Octubre de 2010 07:08 p.m.
> Para: Diego Spano
> CC: greenstone-users@list.scms.waikato.ac.nz
> Asunto: Re: [greenstone-users] Disk space
>
> Diego,
>
> Thanks for the quick reply! This is a big help to setting up my build
> machine
> properly.
>
> Just one more thing: now that I've gone through the import process and my
> archives directory is populated (and I've freed up the necessary space),
is
> it
> possible to start the build process where it left off? That is, can the
> build
> process be started with the creating of the indexes (the building folder)
> without having to go through another 2 days of processing? I'm using the
> PagedImage component (without the cache for space reasons) and as I stated
> before there are over 40200 images!
>
> Thanks again!
>
>