|Great!. Just one last thing. When you are defining and testing a new
collection, you can use "-max_docs" option for the import process just to
process only a small subset of documents, suppose 20 or 30 only. Then, if
everithing goes OK, you cut off that option and make a full import!.
De: William Mann [mailto:firstname.lastname@example.org]
Enviado el: Jueves, 14 de Octubre de 2010 12:10 p.m.
Para: Diego Spano
Asunto: Re: [greenstone-users] Disk space
I used the metadata with the spaces because that's the way it was in the
Greenstone tutorial that I followed but I'll change it to take out the
You're right, my problem is with the hierarchy classifier. Thanks for
explaining this to me - I'll be correcting it all now and regenerating
the collection (this time I'll be using the command line so I can get
some accurate timing).
Thanks again for all your help!
Rag. William Mann
Comune di Belluno
Servizio Sistemi Informativi
Piazza Castello, 14
Il 14/10/2010 16:51, Diego Spano ha scritto:
> Don□t use metadata names with spaces. You have<Subject and Keywords>,
> replace it with<Subject_and_Keywords> or simply<Subject>.
> I think that your problem is with a hierarchy classifier. Am I right?.
> classifier uses "/" and "|" as level separators. So, with the Subjet
> metadata you have, Greenstone will create a hierarchy like this:
> Narrativa e diari di viaggio
> ----> La giovent□ perduta - Rifacimento definitivo del testo (C/
> --------> cassetto)
> What you have to do is to configure the hierarchy classifier to accept
> "|". You can do this with the following parameter:
> classifyHierarchy -metadata Subject -separator [|]
> And this will create a hierarchy like this:
> Narrativa e diari di viaggio
> ----> La giovent□ perduta - Rifacimento definitivo del testo (C/cassetto)
> Test it please.
> -----Mensaje original-----
> De: William Mann [mailto:email@example.com]
> Enviado el: Jueves, 14 de Octubre de 2010 11:12 a.m.
> Para: Diego Spano
> CC: firstname.lastname@example.org
> Asunto: Re: [greenstone-users] Disk space
> The process that's taking 2 days (circa) is the import process. I ran
> the buildcol.pl script this morning and it took less than an hour to
> complete. For the import I have 40200+ jpegs with no text. I've created
> a series of .item files for the PagedImagePlugin which contains two
> lines of metadata used to organize the images. The PagedImagePlugin does
> create a smaller image for the web pages though. Other than that, I'm
> not doing anything complicated.
> May I ask you another question? In my .item files that are being
> processed, the metadata is indicated as so:
> <Title>La giovent□ perduta - Rifacimento definitivo del testo (C/cassetto)
> <Subject and Keywords>Narrativa e diari di viaggio|La giovent□ perduta -
> Rifacimento definitivo del testo (C/cassetto)
> My problem is that where there is something like '(C/cassetto)' the text
> is being truncated and becomes '(C'. Is there an escape sequence to
> allow the use of special characters (and accented ones also)?
> Thanks for the time you've dedicated me, I am very grateful.