This collection demonstrates Greenstone's ability to extract acronyms from documents.These documents are drawn from the web site of the FAO ( If you click on the 'language' button at the top of this page you can browse the acronyms in the collection.

Greenstone's acronym extraction facilities were contributed by Stuart Yeates. Further information is available from the Text Mining Research Group of the New Zealand Digital Library Project or Stuart's homepage.

Any document plugin can be asked to extract acronyms with the -extract_acronyms parameter. For example, this collection is primarily based on HTML documents, so the plugins used are

plugin         GMLPlug
plugin         HTMLPlug -w3mir rename_assoc_files -extract_acronyms
plugin         ArcPlug
plugin         RecPlug

The acronym classifier is defined just like any other metadata

classify       AZList metadata=Title
classify       AZCompactList metadata=Acronym
but requiers specialised formatting instructions:
format CL2VList '<td valign=top>[link][icon][/link]</td> 
  <td><b>[Acronym]</b><br>In: [Title]<br>[link][URL][/link]</td>}'

(The full formatstring must appear on one line in your collect.cfg file.)

How to find information in the acrodemo collection

There are 3 ways to find information in this collection:

  • search for particular words that appear in the text by clicking the Search button
  • browse documents by Title by clicking the Titles button
  • browse documents by acronym occurrence by clicking the Acronyms button