[greenstone-users] Very Important

From John Rose
DateTue Jul 21 04:42:07 2009
Subject [greenstone-users] Very Important
Dear Arabic speaking colleagues,

I'm inviting Graeme to explain his reference to "characters in
presentation format". It would seem to me that, if the OCR program can
save Arabic text in any standard character set, searching for full or
truncated words should be possible in Greenstone (the latter only if the
collection is built with the Lucene indexer, same restriction as for
other languages).

I have been told by some Arabic speakers that satisfactory OCR software
is not readily available for Arabic text. It would be nice if Amr and
Graeme or other colleagues could comment on this, providing information
on their experience with Arabic OCR software.

Reminding that there is an Arabic Greenstone blog at
http://arabicgsdlblog.blogspot.com/ and also an Arabic discussion list
at http://www.freelists.org/list/greenstone4arab . We hope to promote
the establishment of an Arabic Greenstone user group, but this effort is
hampered by lack of information on users and applications. It would be
useful if users of Greenstone in Arabic could identify themselves and
provide feedback on their needs, problems and applications, either to
this list or offlist to me.

Best regards, John Rose of Greenstone Team

> From: amr hassan <amr_ftoh2008@yahoo.com>
> To: "greenstone-users@list.scms.waikato.ac.nz"
> <greenstone-users@list.scms.waikato.ac.nz>
> Date: Tue, 14 Jul 2009 03:15:40 -0700 (PDT)
> Message-ID: <739895.80061.qm@web38603.mail.mud.yahoo.com>
> Subject: [greenstone-users] Very Important
> Can Greenstone Recognize Arabic Letters Made By OCR ?
> thaks
> From: graeme <foster.graeme@gmail.com>
> Precedence: list
> MIME-Version: 1.0
> Cc: "greenstone-users@list.scms.waikato.ac.nz"
> <greenstone-users@list.scms.waikato.ac.nz>
> To: amr hassan <amr_ftoh2008@yahoo.com>
> References: <739895.80061.qm@web38603.mail.mud.yahoo.com>
> In-Reply-To: <739895.80061.qm@web38603.mail.mud.yahoo.com>
> Date: Tue, 14 Jul 2009 18:11:43 +0700
> Message-ID: <dfb4b74f0907140411p233d8ee3rf5faeb21558fc6a9@mail.gmail.com>
> Content-Type: multipart/alternative; boundary=0016e64601d018b503046ea88249
> Subject: Re: [greenstone-users] Very Important
> The short answer is yes.
> A possible problem is that if the OCR saves the characters in presentation format then searching for part words would not work.
> Graeme.


John B. Rose
1 Bis, Rue des Chbtre-Sacs
92310 Shvres
Email: <john.rose1@free.fr>
(in case of bounce then send to