About this collection
This prototype collection based on ebooks from the International Childrens Digital Library Project and Project Gutenberg. It contains about 50 DjVu books from the former and 50 plain text books from the latter. You will need to download a DjVu browser separately to see these documents.
Each document has a metadata record, and most have either a DjVu or a plain text version as well. The metadata record prettyprints the XML metadata downloaded from the Internet Archive site, and gives a link at the bottom to the raw XML metadata record. It also gives access to the plain text, if any exists (sometimes the DjVu file does not contain any). In the case of the DjVu books this is the raw OCR'd text; in the case of the Gutenberg books it is just the document text. In either case this is what is full-text searched.
For some books, there is just the metadata record, and no DjVu or plain text. We downloaded the .zip file and used the plain text (.txt file) it contained. In some cases there was no plain text, but (e.g.) RTF or HTML instead. Greenstone can cope with this, we just didn't have time to include it in the demo.
You may find access rather slow, particularly to the DjVu versions, since this is coming over a thin wire from New Zealand.
Here's how to access the collection. From the 'search' button you can search the full text, the titles, the authors, or all metadata. Try searching for 'the' (our standard test!). If you go to the Preferences page (far right button) you can switch from the default 'plain search' to a more conventional library-catalog-style 'form search'. You can also switch the interface language: there are about 20 options, although the translation is sometimes incomplete.) From the 'titles a-z' button you can browse titles in alphabetic ranges; similarly for authors with the 'authors a-z' button. The 'dates' button allows browsing by year. The 'phrases' button takes you to a hierarchical phrase browser: for a seasonal example type 'christmas' and click 'search' (or press enter), then click one of the black lines to have it expanded (the red lines take you to the documents).
It is entirely trivial to put this whole collection onto a self-installing CD-ROM that runs on all Windows versions.
How to find information in the ebooks collection
There are 5 ways to find information in this collection:
- search for particular words
- access publications by title
- access publications by author
- access publications by date
- browse phrases occurring in publications
You can search for particular words that
appear in the text from the "search" page. This is the first page that
comes up when you begin, and can be reached from other pages by pressing
the search button.
You can access publications by title by
pressing the titles a-z button. This brings up a list of books in
You can access publications by author by
pressing the authors a-z button. This brings up a list of books,
sorted by author name.
You can access publications by date by pressing
the dates button. This brings up a list of all the issues, sorted
You can browse phrases occurring in publications by
pressing the phrases button. This uses the phind phrase browser.