[greenstone-users] RE: How to process a document after convertion fails

From Diego Spano
DateTue Sep 30 11:19:51 2008
Subject [greenstone-users] RE: How to process a document after convertion fails
Hi lists,

I have a collection of almost 10.000 pdfs. Some of them have security
restrictions so text can□t be extracted. In those cases, the plugin rejects
the documents saying "PDFplug failed to convert to html. No plugin could
process this file".

Those documents can□t be retrieved even by classifiers, because they were
rejected.

I think that PDFPlug.pl can be modified this way:

if convertion process fails, then process the file with no conversion (like
an image) but taking care for metadata and classifiers. In this way, the
document will be in classifiers no matter if it has text index or not.

Could anyone do this modification to PDFPlug.pm?

TIA

Diego Spano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/mailman/private/greenstone-users/attachments/20080929/071a9458/attachment.html