Re: [greenstone-devel] two questions: ImagePlug status and, producing hierarchical collection structure

From Katherine Don
DateMon, 23 Feb 2004 15:26:03 +1300
Subject Re: [greenstone-devel] two questions: ImagePlug status and, producing hierarchical collection structure
In-Reply-To (5-1-0-14-2-20040216165011-00b085b0-psulias-psu-edu)
Hi Michael

Firstly, ImagePlug should work on Windows as well (and probably Mac OS X), as
long as ImageMagick is installed.

As for the rest, I have recently created a plugin called PagedImgPlug. Here is
the blurb I sent to the list previously:

----
I have added a new plugin to Greenstone. It processes sequences of image
files (the sequence is defined by an auxiliary text file) into paged
documents - so you get prev and next page arrows, and a goto page box on
the document display.
If there is OCR text in text format for each image this can also be
added to be used for searching.

Please see
http://www.sadl.uleth.ca/gsdl/cgi-bin/library?a=p&p=about&c=curry for an
example collection
(Note only two books have text and Titles, so the Date browser is the
best thing to use for this collection)

This will be available for the next release. Please let me know if you
want this earlier and I will send you the plugin.
----

This should help you get some of the way towards your goal. I'll send you the
plugin in a separate mail. There is a README file, and some more info at the
start of the plugin file. Let me know if you need more help.

The plugin creates only a linear sequence of images, so you would need to
modify it to get a deeper hierarchy - it shouldn't be too difficult to modify
the format of the .item file and get the plugin to produce a hierarchical
document.

Regards,
Katherine Don

Michael Pelikan wrote:

> Hi all -
>
> The present documentation (we've got March 2003) says that the ImagePlug
> plugin only functions under UNIX.
>
> Can someone confirm this, and, if so, please give a newbie a clue as to how
> to import page images in screen-resolution tiffs along with accompanying
> page-length text files and have the two associated with one another. I've
> seen how this works with pdfs that have embedded OCR text, but want to know
> what I need to do to work from tiffs.
>
> In other words, I'd like to import page images and their associated text
> files, AND (and this is really perhaps a second question), have all of
> these organized after the build hierarchically as pages within chapters
> within works within a collection. I've read the documentation, and I've
> tried a number of builds. So far, I've built several collections containing
> combined masses of material from multiple works. There's something I'm
> hoping is patently obvious to one of you that I'm missing so far.
>
> For example, I've pre-arranged source docs in a hierarchical arrangement of
> folders and have fed the parent of those to the build process in hopes that
> it would recognize and respect the organization of works in folders within
> the parent folder. Greenstone ingested the parent folder's contents, but
> produced a single-level collection with everything massed together.
>
> I'm obviously still missing the critical AhHah!
>
> Any suggestions will be gratefully read and incorporated into our efforts...
>
> Many thanks,
>
> Michael Pelikan
>
> Michael Pelikan
> Technology Initiatives Librarian
> Department for Information Technology (I-Tech)
> University Libraries
> Penn State University
> 102 Paterno Library
> University Park, PA 16802-1808
> (814) 865-5660
> mpp10@psulias.psu.edu
>
> _______________________________________________
> greenstone-devel mailing list
> greenstone-devel@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-devel