Re: [greenstone-users] PDFPlug

From Katherine Don
DateTue, 08 Jun 2004 15:55:08 +1200
Subject Re: [greenstone-users] PDFPlug
In-Reply-To (000501c44ca6$e08437a0$aa5d010c-DIEGOS)
Hi Diego

Are you just trying to rebuild the old collection under 2.50? Or is it a
new collection?

It may be that all the PDFs contain images of text rather than computer
readable text, and therefore they all get the same HASH ID, and
therefore all end up in the same folder, overwriting each other.
Try running import.pl with the '-OIDtype incremental' option - this will
assign sequential numbers as the oid instead of calculating a hash
value, and therefore each document should get a new id.

If this doesn't work, please send me a few more of your pdf files - its
a bit hard to see whats happening with only one.

regards,
Katherine Don

Diego Spano wrote:
> Does PDFPlug included in GS v2.5 has any bug? I have a collection that
> was created with 2.40a and it has a lot of indexed pdf files . Now, I
> have GS 2.5 and when I tried to index documents, the import process only
> generates one hash folder, and inside it only the last document present
> in metadata.xlm was created.
>
> Any help? Find attached metadata.xlm and collect.cfg files.
>
> Thanks in advance.
>
> Lic. Diego Spano
> Archivo Digital
> Secretaria de DD. HH.
> Ministerio de Justicia, Seguridad y DD. HH
> djspano@jus.gov.ar <mailto:djspano@jus.gov.ar>
>
>
>