Search in for of the words  


About this collection

This collection is based on the MARC records in the Library of Congress Catalog that include Beowulf in their title. Here is a sample document in the collection.

How the collection works

The configuration file uses ZIPPlug and MARCPlug, apart from the standard three. An input_encoding argument is used because data in this archive contains extended characters. There are three classifiers, based on Title, Creator, and Subject metadata. All are AZCompactList classifiers, and all specify a mingroup of 1. This forces them to create a bookshelf icon even if there is only one item on the shelf (like this). The reason is aesthetic: the list has a uniform appearance: all items look the same, including one-book entries. (Of course, if you don't like this style, just leave out the mingroup argument.) A second argument for the Title and Creator classifiers removes suffixes from the metadata string (Title and Creator respectively). This is specified as a PERL regular expression, and trims characters (such as trailing punctuation) from the strings for display. The three format statements are similar: in particular, they each put out the number of leaf documents on the right-hand side of the display, as you can see here.

The MARC plugin uses a special file to map MARC field numbers to Greenstone-style metadata. This file resides in the gsdl/etc directory, and is called marctodc.txt. It lists the correspondences between MARC field numbers and Greenstone metadata. Any MARC fields that are not listed simply do not appear as metadata, though they are still present in the Greenstone document. Each line in the file has the format

<MARC field number> --> GreenstoneMetadataName
Lines in the file that begin with "#" are comments (however, comments have been stripped out of the listing below).

The standard version of this file is loosely based on the MARC to Dublin Core mapping found at (which assumes USMARC/MARC21), and contains these lines:

720 --> Creator
100 --> Creator
110 --> Creator
111 --> Creator
520 --> Description
856 --> URL
260 --> Publisher
787 --> Relation
540 --> Rights
024 --> MarcIdentifier
786 --> MarcSource
546 --> MarcLanguage
650 --> Subject
653 --> Subject
245 --> Title
655 --> Type

Several different MARC fields are mapped on to Dublin Core Creator. Field 720 is "Uncontrolled name," 100 is "Personal name," 110 is "Corporate name," 111 is "Meeting name." Actual MARC records normally define only one of these fields, and anyway Greenstone allows multi-valued metadata.

MARC field 520 ("Summary, note") is mapped to Dublin Core Description; field 856 ("Electronic location") is mapped to URL; field 787 ("Nonspecific relationship note") to Relation; field 540 ("Reproduction note") to Rights; field 245 ("Title statement") to Title; field 655 ("Index term - genre/form") to Type. Both fields 650 ("Subject: topical term") and 653 ("Index term: uncontrolled") are mapped to Subject.

MARC field 024 ("Identifier") is not mapped to Greenstone metadata, because Greenstone uses its own Identifier metadata; instead it is mapped to a different Greenstone metadata element called MarcIdentifier. Likewise field 786 ("Data source entry") is not mapped to Source, because Greenstone has Source metadata, but to a new metadata field called MarcSource instead; and field 546 ("Language") is mapped to MarcLanguage.

Some MARC fields with Dublin Core counterparts are simply ignored, e.g. 620 (Contributor) and 500 (Coverage). MARC field 260 is called "Publication, etc") and is mapped in its entirety to Publisher. In fact, field 260c (a subfield) is supposed to be publication date, but is not mapped as such.

Of course, different mappings can be defined by altering the above file-which allows the MARC plugin to support other variants of the MARC format. The plugin does not recognize individual MARC subfields: it simply concatenates them together. However, enhancing it to deal appropriately with subfields would not be a difficult job: it would involve altering a couple of pages of PERL code in the MARC plugin.

How to find information in the MARC example collection

There are 4 ways to find information in this collection:

  • search for particular words that appear in the text by clicking the Search button
  • browse documents by Title by clicking the Titles button
  • browse documents by Creator by clicking the Creators button
  • browse documents by Subject by clicking the Subjects button