|Diego asked: "Is Adobe Acrobat Professional the sole software that can merge the OCR data with the PDF, does anyone know any other software."
The best OCR software that I have found is ABBYY FineReader. It does cost, but it is amazing how much power and control it gives the user for text, tables, and images. It also has spell-check.
FineReader does allow you to save the output to pdf.
I use it for several projects. I am currently OCRing court transcripts (which have numbered lines, weird indentations, and sometimes a black rectangle around entire pages).
I've also OCRed old conference proceedings (which have loads of tables, varying fonts, and images that need to stay images). I'm using the reference sections of the conference papers for citation analysis, and ABBYY FineReader produces good OCRed text for even this purpose.
---- Original message ----
>Date: Tue, 20 May 2008 14:05:03 -0300 (ART)
>From: Diego Nicol□s Casar Gonz□lez <dncg64;aigrenis.com>
>Subject: [greenstone-users] PDFs - OCR metadata - Adobe Acrobat Professional
>Dear Greenstone list,
>I've been trying several OCR software (mostly Open Source or Linux
>oriented) for the past months as I need Greenstone to be able to search
>text over Image PDFs.
>Recently I've realized that Adobe Acrobat Professional has an interesting
>implementation of OCR, such is that it will merge the OCR data within the
>PDF itselft, allowing the user to search (with most PDF readers,
>i.e.Acorbat Reader, KPDF, etc.) and higligthing the results. AFAIK, that
>functionality was available only for text PDFs.
>Greenstone extracts that metadata just fine, so that the users can search
>first inside the collection, download the PDF and then search inside the
>image PDF with the reader.
>Is Adobe Acrobat Professional the sole software that can merge the OCR
>data with the PDF, does anyone know any other software.
>Thanks in advance,
>Diego Nicol□s Casar Gonz□lez
>Tel: (+54) 011 5252.0810
>Movil: 15 4186.1334
>Pe□a 2056 : Piso 7 B
>Capital Federal : Argentina
>greenstone-users mailing list