About this collection

This collection, which contains 10 BibTeX entries, is a small adjunct to the sample bibliography collection. It contains a few Computational Learning Theory (COLT) papers from the year 2003.

How the collection works

The purpose of the collection, whose name is "cltext-e", is to illustrate Greenstone's "supercollection" facility.

It is a very small collection with only 10 documents, as you can see by clicking the titles a-z button. It has the same structure as the much larger bibliography collection, except that there is no phrases button (because it's far too small for a phrase index to make sense).

The collection configuration file is just the same as that for the bibliography collection (which is called "cltbib-e"), except for one small but crucial difference: a line that states

     supercollection cltext-e cltbib-e 

This only affects searching, not browsing. It means that when you do a search, the cltbib-e collection is automatically searched as well. For example, if you look in the titles a-z list you will see that a couple of the items are about "genetics". But if you search for genetic you will find that 38 documents match the query. That's because the same search in the bibliography collection finds 36 documents.

Supercollection has two principal uses. One is when a collection is be continually updated with new material, but is impractical to rebuild after each addition. New material can be accumulated in a small supplementary collection that rebuilds very quickly. Periodically, the two are amalgamated and rebuilt. The main collection would contain a supercollection statement. (The present collection has a main "bibliography" and a smaller "supplement", but here -- because of the way we have decided to explain it -- the supercollection statement is in the wrong collection!)

Supercollection is also useful for gigantic collections. Ordinary Greenstone collections can be very large -- we have built one with 11 million short documents, and another with 7 GB of text. However, gigantic collections may have be split up. One smaller-scale example is a collection of over a million small documents that is delivered on 5 CD-ROMs, each of which works individually as an independent Greenstone collection. When several are insalled on the hard drive, they can be searched together as a seamless unit.

The supercollection facility only works for searching. The present collection, when browsed, is very small; but when searched it is much larger. We have found that browsing is of limited use for very large collections anyway. For example, in the above-mentioned CD-ROM collection, there are no browsing buttons, only searching.

In the present example the two collections have precisely the same structure. Supercollection still works even if they are different. Each collection appears as it would normally, and cross-collection searching works as expected. When documents are displayed, they're shown according to the format defined in the collection to which they belong.

Each collection can be used individually by following the instructions below (which Greenstone generates automatically).

