About this collection

This collection, which contains 3817 BibTeX entries, is made from the Computational Learning Theory (COLT) Bibliography. COLT's home page is http://www.learningtheory.org/, while the bibliography's home page is http://www.i.kyushu-u.ac.jp/~thomas/COLTBIB/coltbib.jhtml.

How the collection works

The collection incorporates a form-based search interface that allows fielded searching. In Greenstone, this means that an enhanced search engine (called mgpp) must be used, rather than Greenstone's default search engine (called mg). There is an online help document for mgpp. The collection has an "advanced" form search interface, and also a plain single-field search page. These two variants can be selected from the collection's Preferences page.

The collection configuration file begins with the specification groupsize 200. This groups documents together into groups of 200. Bibliography collections typically have many small documents, and grouping them together prevents Greenstone's internal file structures from becoming bloated and occupying more disk space than necessary.

Apart for the standard ones, the plugins specified for this collection are ZIPPlug, which unzips compressed documents and archives, and BibTexPlug, which processes references in the BibTeX format (well known to computer scientists).

Fielded searching, with a form-based interface, is selected by searchtype form in the configuration file. In fact, this collection uses searchtype form plain, which includes a plain textual full-text search index as well (since form comes first, it is the default interface; you reach the plain search through the Preferences page).

The inclusion of searchtype means that the search engine mgpp is used, and for this indexes are specified in a slightly different way. Whereas Greenstone normally allows the various indexes to be specified to be at different "levels" (document, section, paragraph), with mgpp they are all at the same level -- document by default (as in this case). The level can be changed using a levels statement. Also, whereas in other collections indexes can be specified on text or on any metadata, here there are additional possibilities: you can specify indexes on every metadata field by using the single word metadata, and an index for all the metadata fields together by using the word allfields.

In this case the indexes line specifies searchable indexes on the full text and on every metadata field. Thus when the "field" menus in the search page are pulled down, they show full record followed by an entry for each metadata element. Collection-level metadata collectionmeta can be specified for any index to determine what it is called in the menu (except for metadata, which produces many menu items). In this case, the configuration file specifies that the text index should be named "full record" because it contains the original bibliographic record.

This collection contains Title, Author, and Date browsers, and a special kind of phrase index called "Phind." The AZCompactList classifier used for the Author browser is like AZList but generates a bookshelf for duplicate items as shown here. The BibTeX plugin records each author as Author metadata; it also puts a list containing all authors into the Creator metadata element. Consequently the AZCompactList classifier is based on Author. However, Greenstone has a standard button reading authors a-z whose name is (confusingly) "Creator", so this button name is specified for the classifier.

The "Phind" classifier creates a phrase index like this. It contains a browsable list of phrases extracted from the material specified in the text argument of the classify Phind line in the configuration file. Here the specification is

-- that is, the title, list of authors, title of the collected work (if any) in which this item appears, and publisher. Note that this specification follows the mg convention with level:field. Phind indexes are more usually based on the entire full text of a collection, using the specification document:text.

The best way to see what Phind does is to play with this index. You type a word in the search box, click Search, and a list of phrases containing that term appears in the top panel. Click on one of these phrases and a list of phrases containing that phrase appears in the bottom panel. You can continue doing this, expanding the phrase more and more. The lists can be lengthened using the get more phrases button. At the end of the list of phrases appears a list of documents containing that phrase, in blue text; you can lengthen this list by clicking get more documents.

The format statements for the search results list and the title browser are both determined by the VList specification. It gives a document icon that links to the document itself (which in this collection is the full reference); the title in bold; Creator metadata if there is any, otherwise Editor metadata; and Date metadata if there is any. Here is an example.

The format statement for the author browser (CL2VList) is more complex. The AZCompactList classifier generates a tree whose nodes are either leaf nodes, representing documents, or internal nodes. A metadata item called numleafdocs gives the total number of documents below an internal node. This format statement checks whether numleafdocs exists. If so the node must be an internal node, in which case the node is labeled by its Title. But beware: this classifier is generated on Author metadata, so its title -- the title of the classifier -- is actually the author's name! This means that the bookshelf nodes here are labeled by author's name. The leaf nodes, however, are labeled the same way as documents (i.e. references) are in the search results list.

The documents themselves (here is an example) are generated by two format statements, one (a long one) called DocumentHeading, and another called DocumentText. The DocumentHeading, which is the top two-thirds of the page, contains the document's Title followed by a table that gives all the metadata elements that the BibTeX plugin can generate. The role of all the If statements in the configuration file is to determine which elements are defined.

The DocumentText shows the BibTeX version of the reference. However, when the document is displayed initially, only a hyperlink reading Show BibTex Record appears -- this corresponds to the last part (that is, the "else" part) of the If statement in DocumentText. When this hyperlink is clicked, the href goes to the same URL but with showrecord=1, which generates a page like this. The If test succeeds, which shows the Text of the document. With the BibTeX plugin, the text of a document is its unadulterated BibTeX record.

How to find information in the Bibliography collection collection

There are 5 ways to find information in this collection:

  • search for particular words that appear in the text by clicking the Search button
  • browse documents by Title by clicking the Titles button
  • browse documents by Creator by clicking the Creators button
  • browse documents by Date by clicking the Dates button
  • browse phrases occurring in documents by clicking the Phrases button. This uses the phind phrase browser.