My understanding of your problem from the two emails is that you have
some images (of notices) and some corresponding OCR text for each image.
Greenstone does not handle this well by default but there are several
options you can try.
1. Build a collection on only the image files, and add the text as
metadata to each image. You can do this manually using the Librarian
Interface: drag all the images into a new collection and add
Date/Subject/Text metadata to each one. This is probably not appropriate
for large collections.
Alternatively you can put the metadata into a metadata.xml file - the
format is described in Section 2.1 in the developers guide.
For this case it is possible to write a simple script that takes each
file of OCR text and adds it as metadata to the appropriate image in the
If you want to have simple searching over the text as a whole, you just
need to create one metadata element, eg NoticeText, and create the index
on that. If you want fielded searching, eg by Date or Subject, you will
need to create separate metadata items for each field.
2. You can also use the html files you have created. Build the
collection on the html files, - the image will be kept as an associated
file, and the text will be indexed. When you display the document you
then need to change the DocumentText format statement to only display
There are two ways to do this:
A. Modify the HTML plugin to create a new metadata element eg
NoticeImage whose value is the image name. Then use [NoticeImage] in
your format statement.
B. Put each HTML file and its image into a separate directory in the
import folder. Give all the images the same name eg notice.jpg.
Then the format statement can use
to display the image
I hope this has given you some ideas of things to try.
Rajesh Jha wrote:
> dear sir,
> I am Rajesh Jha working as a trainee in C-DOT on the
> project digital library.I want to add one feature that
> is enotice.I have made some html pages which includes
> the jpg image of the notices and the OCR text in it,
> Is it possible to make a collection of such html pages
> so that one can search the page according to the
> dates, text or subject in the OCR text and view the
> notice image in the html page without the text
> embedded in it.
> Hello all,
> I have a collection of images with descriptions for
> each is it possible to make collection how?.