Search in for of the words  

 

About this collection

The Humanitarian Development Libraries represent a large collection of practical information naimed at helping reduce poverty, increasing human potential, and providing a practical and useful education for all. This subset contains about 25 publications--documents, reports, and periodical articles--in various areas of human development, from agricultural practice to economic policies, from water and sanitation to society and culture, from education to manufacturing, from disaster mitigation to micro-enterprises.

The editors of this collection are Human Info NGO, HumanityCD Ltd, and participating organizations. Contact us at Humanitarian and Development Libraries Project, Oosterveldiaan 196, B-2610 Antwerp, Belgium, Tel 32-3-448.05.54, Fax 32-3-449.75.74, email humanity@humaninfo.org.

How the collection works

The DLS collection has exactly the same structure as the Greenstone demo collection that is supplied with the software and used extensively as an example in the documentation. It's a fairly complex collection, and if you're just starting out you might prefer to look at some other collections first (e.g. msword and pdf demonstration, or the greenstone archives, or the simple image collection).

The collection configuration file, like all collection configuration files, begins with a line ("creator") that gives the email address of the collection's creator, and another ("public") that determines whether the collection will appear on the home page of the Greenstone installation.

Collection-level metadata. The collectionmeta lines in the configuration file are also standard in all Greenstone collections. They give general information about the collection, defining its name, a brief description that appears on its home page, and two versions of the collection's icon. The brief description (in collectionextra) can be seen on the DLS collection's home page (i.e. at the top of this page). The iconcollection item gives the image proclaiming "development library subset" that appears at the upper left of this page (it looks like this). If it were absent, the collection's name would appear instead. This image is placed in the images subdirectory of the collection's directory (typically, on Windows configurations, in C:\Program files\gsdl\collect\dls-e\images, "dls-e" being the internal name of the collection). The iconcollectionsmall is a smaller version of the icon (like this) that is used on the Greenstone home page.

Plugins. The third block of lines in the configuration file gives the plugins used by the collection. The documents in the DLS collection are in HTML, so HTMLPlug must be included. The description_tags option processes tags in the text that define sections and section titles as described below. WordPlug and PDFPlug also appear in the configuration file, but are not used for the documents in the DLS collection. Extra plugins do no harm. In general the ordering of plugins is not significant, unless there are two different plugins that can process the same type of document.

The other plugins, GAPlug, ArcPlug, and RecPlug, are used by Greenstone for internal purposes and are standard in almost all collections. The use_metadata_files flag on RecPlug directs Greenstone to look for metadata.xml files that specify metadata for the documents in XML format (see below).

Searchable indexes. The block of lines starting with indexes specifies what searchable indexes will be available. In this collection there are three: you can see them when you pull down the "Search for" menu on the search page. The first index is called "chapters", the second "section titles", and the third "entire documents". The names of these three indexes are given by three collectionmeta statements.

The contents of the indexes -- that is, the specification of what it is that will be searched -- are defined by the indexes line at the beginning of this block. This specifies three indexes, two at the section level (beginning with section:) and one at the document level (beginning with document:). The difference is that a multi-word query will only match a section-level index if all query terms appear in the same section, whereas it will match a document-level index if the terms appear anywhere within the document (which typically comprises several sections). The first and third indexes are section:text and document:text, and the :text means that the full text of sections and documents respectively will be searched. The second is section:Title, which means that Title metadata will be searched -- in this case, section titles (rather than document titles). The three indexes appear in the order in which they are specified on the indexes line.

Classifiers. The block of lines labeled classify define the browsing indexes, called "classifiers" in Greenstone. There are four of them, corresponding to four buttons on the navigation bar at the top of each page in the collection (e.g. the search page): subjects, titles a-z, organisations, and keywords The search button comes first, then come the four classifiers, in order.

The first classifier provides access by subject. It is a Hierarchy classifier whose hierarchy is defined in the file dls.Subject.txt (the hfile argument); this file is discussed below. This classifier is based on Subject metadata, and when several books appear at a leaf of the hierarchy they are sorted by Title metadata (as you can see here). The second provides access by title: it is an AZList classifier based on Title metadata. The third provides access by organization: it is a Hierarchy classifier based on Organization metadata whose hierarchy is defined in dls.Organization.txt; this file is discussed below. Again, the leaves of the hierarchy are sorted by Title metadata. The fourth provides access by Keyword metadata: it also is a Hierarchy classifer (see below).

Cover images. Greenstone looks for a cover image for each document, whose name is the same as the document's but with a .jpg extension. This image is associated with the document, and may be displayed on the document page (see below). Cover images can be switched off by setting the -no_cover_image flag for each plugin.

Format statements. The next block contains five format statements. The first applies to Vlists. These are lists of items displayed vertically down the page, like the the lists displayed by the titles a-z browser, those at the leaves of the subject and organisation hierarchies, and the tables of contents of the target documents themselves. However, for the search results page it is overridden by the second format statement (SearchVList). The third governs how the document text is formatted, with Title metadata ([Title]) in HTML <h3> format followed by the text of the document [Text]. The fourth ensures that cover images are shown with each document. The fifth calls for the Expand Text, Expand Contents, Detach and Highlight buttons to be shown with each document.

Most format statements contain a string specified in an augmented form of HTML. Metadata names in square brackets (e.g. [Title], [Creator]) give the value of that metadata; [Text] gives the document text. A hyperlink to the document can be made using [link] ...[/link]; an appropriate icon is produced by [icon]. Format strings can include {If}{... , ...} and {Or}{... , ...}; the first two give examples. These two are fairly complex format statements; we will not explain them here. In Greenstone, changes in format strings take effect immediately unless you are using the local library server, in which case the server needs to be restarted. This makes it easy to experiment with different versions of a format statement and see what happens.

Language translations. The last part of the collection configuration file gives the collection-level metadata in French and Spanish respectively. The languages are indicated by square brackets: [fr] and [es]. If there is no language specification, English is assumed by default. The configuration file shows accented characters (e.g. French é). This file is in UTF-8, and these characters are represented by multi-byte sequences (<C3><A9> in this case). Alternatively they could be represented by their HTML entity names (like &eacute). It makes no difference: they look the same on the screen. However, if the text were searchable it would make a difference; Greenstone uses Unicode internally to ensure that searching works as expected for non-English languages.

Description tags. The description tags recognized by HTMLPlug are inserted into the HTML source text of the documents to define where sections begin and end, and to specify section titles. They look like this:

  <!--
  <Section>
  <Description>
  <Metadata name="Title">
  Realizing human rights for poor people: Strategies
  for achieving the international development targets
  </Metadata>
  </Description>
  -->
  (text of section goes here)
  <!--
  </Section>
  -->
  
The <!-- ... --> markers are used to ensure that these tags are marked as comments in HTML and therefore do not affect document formatting. In the Description part other kinds of metadata can be specified, but this is not done for the style of collection we are describing here. Exactly the same specification (including the <!-- ... --> markers) can be used in Word documents too.

Metadata Files. Metadata for all documents in the DLS collection is provided in metadata.xml files, one per document folder. The metadata.xml file for one book -- Go Between 64, June-July 1997 -- is a block of about ten lines encased in <FileSet> ... </FileSet> tags. It defines Title, Language, Subject and AZList metadata. More than one value can be specified for any metadata item. For example, a little further down the file (in the fourth FileSet block) is a book called Animation skills which has two values for Subject. Both of these are stored as metadata values for this particular document (because mode=accumulate is specified; the alternative, and the default, is mode=override).

Title metadata is specified in the metadata file. However, in this collection it is also given in the text of each document, using description tags. Since it appears in both places, Greenstone uses the version defined in the documents in preference to that in the metadata.xml file.

Hierarchy files. The subject hierarchy file dls.Subject.txt contains a succession of lines each of which has three items. The first and last items are text strings, and they are the same. The middle item is a number that defines the position in the hierarchy. The first string is matched against the metadata that occurs in the metadata.xml file described above; the last one is the string that describes that node of the hierarchy on the web pages that Greenstone generates.

For example, the first line contains the three items "Industry, Manufacture and Services", 1, and "Industry, Manufacture and Services". The middle one, the numeral 1, indicates that this subject appears at the first position of the subject hierarchy). The first item, the string "Industry, Manufacture and Services", is what appears in the metadata.xml file as Subject metadata. The third item, which happens to be the same string, is what appears as the text in the first position of the subject hierarchy.

The organization hierarchy file dls.Organization.txt has exactly the same structure. Again, the first and last text strings on each line are the same because the metadata values in metadata.xml are exactly what should be shown on the Greenstone web pages. The number between defines the position in the hierarchy: in this case the hierarchy is flat and the position is simply an integer that determines the order of the list.

The Keyword classifier is also a hierarchy classifier, in this case based on Keyword metadata. This is to allow for the possibility that two different documents have the same keyword.

How to find information in the Development Library Subset collection collection

There are 5 ways to find information in this collection:

  • search for particular words that appear in the text by clicking the Search button
  • browse documents by Subject by clicking the Subjects button
  • browse documents by Title by clicking the Titles button
  • browse documents by Organization by clicking the Organizations button
  • browse documents by How to by clicking the How to button