Re: [greenstone-users] Index.txt and metadata from Excel files

From Michael Dewsnip
DateThu, 15 Jun 2006 15:36:00 +1200
Subject Re: [greenstone-users] Index.txt and metadata from Excel files
In-Reply-To (6-2-1-2-0-20060614162115-01e64cb0-pop3-paradise-net-nz)
Hi,

This e-mail is an attempt to bring together all the recent discussion on
the mailing lists about index.txt files and IndexPlug, and metadata from
Excel files.

Index.txt files are used to assign metadata to documents in a collection
without using the GLI's Enrich pane or generating metadata.xml files
directly. Index.txt files are plain text and have a simple format, so it
is often easy to convert metadata in other formats into index.txt files.
A description of the index.txt file format can be found at the top of
the Greenstone "perllib/plugins/IndexPlug.pm".

Metadata.xml files have three advantages over index.txt files: they are
editable within the GLI, they are easier to edit once the amount of
metadata gets large, and they support regular expressions in the
filenames (so the same metadata can be assigned to multiple files using
wildcards etc.). The disadvantage is that they are more difficult to
understand and generate automatically.

By popular request, I've just written a plugin similar to IndexPlug that
reads metadata from CSV (comma-separated value) files. If you have your
metadata in a Microsoft Excel spreadsheet you can save the spreadsheet
as a CSV file (File -> Save As...) and use CSVPlug to import the
metadata. The spreadsheet needs to be laid out in a certain way: the
first row must contain the metadata names (and one must be called
"Filename"), and the other rows contain the metadata. I've put a small
example Excel file at
http://www.cs.waikato.ac.nz/~mdewsnip/greenstone/temp-2.70w/demo.xls for
reference.

CSVPlug is for use with Greenstone v2.70w only, and is available from
http://www.cs.waikato.ac.nz/~mdewsnip/greenstone/temp-2.70w/CSVPlug.pm
(it goes in your Greenstone "perllib/plugins" directory). You also need
an updated version of RecPlug.pm from
http://www.cs.waikato.ac.nz/~mdewsnip/greenstone/temp-2.70w/RecPlug.pm
(goes in the same directory). After this, you need to delete the
plugins.dat file in "C:\Documents and Settings\<Username>\Application
Data\Greenstone\GLI" (Windows) or "~/.gli/" (Linux) before running the
GLI. Add the .csv file to your collection in the Gather pane, add
CSVPlug in the Design -> Document Plugins view, then build the collection.

CSVPlug was written in a hurry this afternoon and hasn't had much
testing, so let me know if you have any problems with it.

Regards,

Michael