Re: [greenstone-users] PDFPlug

From Azael Barrera, Ph.D.
DateWed, 16 Jun 2004 01:12:32 -0500
Subject Re: [greenstone-users] PDFPlug
Hi Katherine, Diego and others,

I have some sort of related problems. I had two collections I attempted to
build today.
One has 153 pdf documents. I create a vlist of dls.Titles instead of the
title extracted by the
pdf plugin. Fine, after a while, it worked.

Then I did the same process with another collection of about 24 pdf files.
Only 3 docs survived,
not neccesarily in sequence in the metadata.xml file and not the last ones
in the file.
The rest, missing in action. The build.cfg file shows clearly numdocs
3. What happened?

Is same or similar to problem faced by Diego? Should I try similar
solution? In a third
dissapointing try, with a collection of 12 pdf, the result was zero.

I am using GLI from GSDL 2.50 under RH Linux 9. I am using it in the
librarian mode, should I
use it with advanced expert mode to trace the problem? Never had this
problem with 2.40a
and 2.41.

Thanks in advance,


At 03:55 PM 6/8/2004 +1200, Katherine Don wrote:
>Hi Diego
>Are you just trying to rebuild the old collection under 2.50? Or is it a
>new collection?
>It may be that all the PDFs contain images of text rather than computer
>readable text, and therefore they all get the same HASH ID, and therefore
>all end up in the same folder, overwriting each other.
>Try running with the '-OIDtype incremental' option - this will
>assign sequential numbers as the oid instead of calculating a hash value,
>and therefore each document should get a new id.
>If this doesn't work, please send me a few more of your pdf files - its a
>bit hard to see whats happening with only one.
>Katherine Don
>Diego Spano wrote:
>>Does PDFPlug included in GS v2.5 has any bug? I have a collection that
>>was created with 2.40a and it has a lot of indexed pdf files . Now, I
>>have GS 2.5 and when I tried to index documents, the import process only
>>generates one hash folder, and inside it only the last document present
>>in metadata.xlm was created.
>>Any help? Find attached metadata.xlm and collect.cfg files.
>>Thanks in advance.
>>Lic. Diego Spano
>>Archivo Digital
>>Secretaria de DD. HH.
>>Ministerio de Justicia, Seguridad y DD. HH
>> <>
>greenstone-users mailing list

Azael Barrera, Ph.D.
Professor - Consultant
ICT Technology Transfer and Capacity Building