[greenstone-users] Disk space

From William Mann
DateFri Oct 15 04:10:31 2010
Subject [greenstone-users] Disk space
In-Reply-To (003a01cb6baf$510d3520$f3279f60$-gov-ar)
Ciao Diego,

I used the metadata with the spaces because that's the way it was in the
Greenstone tutorial that I followed but I'll change it to take out the

You're right, my problem is with the hierarchy classifier. Thanks for
explaining this to me - I'll be correcting it all now and regenerating
the collection (this time I'll be using the command line so I can get
some accurate timing).

Thanks again for all your help!

Rag. William Mann
Comune di Belluno
Servizio Sistemi Informativi
Piazza Castello, 14
32100 Belluno
Tel. 0437-913156
e-mail: wtmann@comune.belluno.it

Il 14/10/2010 16:51, Diego Spano ha scritto:
> William,
> Don□t use metadata names with spaces. You have<Subject and Keywords>,
> replace it with<Subject_and_Keywords> or simply<Subject>.
> I think that your problem is with a hierarchy classifier. Am I right?. This
> classifier uses "/" and "|" as level separators. So, with the Subjet
> metadata you have, Greenstone will create a hierarchy like this:
> Narrativa e diari di viaggio
> ----> La giovent□ perduta - Rifacimento definitivo del testo (C/
> --------> cassetto)
> What you have to do is to configure the hierarchy classifier to accept only
> "|". You can do this with the following parameter:
> classifyHierarchy -metadata Subject -separator [|]
> And this will create a hierarchy like this:
> Narrativa e diari di viaggio
> ----> La giovent□ perduta - Rifacimento definitivo del testo (C/cassetto)
> Test it please.
> Diego
> -----Mensaje original-----
> De: William Mann [mailto:wtmann@comune.belluno.it]
> Enviado el: Jueves, 14 de Octubre de 2010 11:12 a.m.
> Para: Diego Spano
> CC: greenstone-users@list.scms.waikato.ac.nz
> Asunto: Re: [greenstone-users] Disk space
> Diego,
> The process that's taking 2 days (circa) is the import process. I ran
> the buildcol.pl script this morning and it took less than an hour to
> complete. For the import I have 40200+ jpegs with no text. I've created
> a series of .item files for the PagedImagePlugin which contains two
> lines of metadata used to organize the images. The PagedImagePlugin does
> create a smaller image for the web pages though. Other than that, I'm
> not doing anything complicated.
> May I ask you another question? In my .item files that are being
> processed, the metadata is indicated as so:
> <Title>La giovent□ perduta - Rifacimento definitivo del testo (C/cassetto)
> <Subject and Keywords>Narrativa e diari di viaggio|La giovent□ perduta -
> Rifacimento definitivo del testo (C/cassetto)
> My problem is that where there is something like '(C/cassetto)' the text
> is being truncated and becomes '(C'. Is there an escape sequence to
> allow the use of special characters (and accented ones also)?
> Thanks for the time you've dedicated me, I am very grateful.