Home page

New Zealand Digital Library Project members have developed a range of practical software packages in the course of their research. Much of this software is available for download.

Digital libraries and indexing

  • Greenstone is the digital library system that generates most of the pages of this website. It is freely available under the GNU General public license, and has been adopted by numerous other projects. It is used to disseminate information by humanitarian organisations including Global Help Projects and United Nations organisations.
    • Our website hosts exotic collections, humanitarian collections, and reference collections.
    • Other websites mirror these collections, and host many others.
    • Greenstone is available for download.

  • MG is an enhancement of the Managing Gigabytes full-text retrieval system that provides flexible stemming methods, weighting terms, term frequencies, merged indexes, machine independent indexes, and a port to MSDOS.

  • PreScript converts PostScript to plain ASCII or HTML. It detects paragraph boundaries, removes hyphenation, and interprets many ligatures.

Extracting data and metadata

  • Sequitur is a method for inferring compositional hierarchies from strings by detecting repetition and factoring it out of the string by forming rules in a grammar. Sequitur is useful for recognizing lexical structure in strings, and excels at very long sequences.

  • Kea is a program for automatically extracting keywords and keyphrases from the full text of documents. Candidate keyphrases are identified using rudimentary lexical processing, features are computed for each candidate, and machine learning is used to determines which candidates should be assigned as keyphrases.

Text Mining

Browsing interfaces

  • The 3D Book Visualizer is a suite of programs for creating and interacting with an interactive three-dimensional simulation of a paper-based book. It supports these interactive features:

    • Spinning the book around
    • Zooming in and out
    • Turning a single page or a wodge of pages
    • Flipping through key pages
    • Switching between handling mode and reading mode.

    It supports the PDF and DjVu document formats.

  • Phind is an interface for browsing the phrases that occur in a collection. The phrases form an approximation of the topics covered. They are extracted from the noun-phrases occuring in the text, so nonsense phrases and phrases with very little information content are excluded. Each phrase is part of a hierarchy, and the user can browse more specialised topics, or retrieve documents that contain the phrase, at any point.

  • The collage applet dynamically displays a given set of images. When an image is clicked, a new browser window opens and the associated URL is displayed.

    The applet can be used in two different contexts: either within the Greenstone Digital Library Software or externally using a directory of images and associated links.

Word segmentation

  • Word segmentation is designed to find word boundaries in languages like Chinese and Japanese, which are (unlike English) written without spaces or other word delimiters (except for punctuation marks). It plays a significant role in applications that use the word as the basic unit due to the fact that machine-readable Chinese text is invariably stored in unsegmented form.
    • We have implemented a WWW interface for segmanting Chinese text.
    • If your web browsers does not support Chinese text, illustrations of the transformation are available.


New Zealand Digital Library Project
Department of Computer Science, University of Waikato, New Zealand
July 2000