[greenstone-users] How to process a document after convertion fails

From Katherine Don
DateFri Oct 3 06:29:43 2008
Subject [greenstone-users] How to process a document after convertion fails
In-Reply-To (8EF166C797DB45A8939ED451F6CB7F37-orsna-gov-ar)
Hi Diego

You can achieve this yourself using the plugin list.
The first PDF plugin in the list will process the regular ones.
Then you either need to add a second PDF plugin with different options
(try convert_to pagedimg_jpg, see
http://wiki.greenstone.org/wiki/gsdoc/tutorial/en/enhanced_pdf.htm for
more details), or Unknown plugin with pdf as its process_ext.
The second plugin will catch any that "fall through" the first one.


Diego Spano wrote:
> Hi lists,
> I have a collection of almost 10.000 pdfs. Some of them have security
> restrictions so text can□t be extracted. In those cases, the plugin
> rejects the documents saying "PDFplug failed to convert to html. No
> plugin could process this file".
> Those documents can□t be retrieved even by classifiers, because they
> were rejected.
> I think that PDFPlug.pl can be modified this way:
> if convertion process fails, then process the file with no conversion
> (like an image) but taking care for metadata and classifiers. In this
> way, the document will be in classifiers no matter if it has text
> index or not.
> Could anyone do this modification to PDFPlug.pm?
> Diego Spano
> ------------------------------------------------------------------------
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users