|Date||Thu, 24 May 2007 05:31:39 -0700|
|Subject||RE: [greenstone-users] Importing symbolic links (replacing duplicates)|
Thank you. Will study your #2 and consider.
And I do have an associate who has written AWK programs for detection of duplicates and is working on a tentative solution involving a replacement import directory. Have passed your note on to him (in France!).
H.M. Gladney, Ph.D. http://home.pacbell.net/hgladney
From: Richard Managh [mailto:email@example.com]
Sent: Wednesday, May 23, 2007 3:21 PM
To: H.M. Gladney; greenstone user list
Subject: Re: [greenstone-users] Importing symbolic links (replacing duplicates)
If I understand what you want to do correctly, I imagine you want a document to be a unique filename, and the directories it appears in to be part of the metadata of this file.
I'd say you probably need to know some Perl or need a perl programmer for this problem.
So lets say you had a file a.html, and the directories it appears in are import/a, import/b, import/c/d/e/f, import/g.
you might have as part of your built collection a document display that looks like this:
Solution 1: (see below for solution 2 which is more recommended)
I would suggest you:
o try importing your files using UnknownPlug,
o altering the identifier or OID that greenstone uses to identify documents in a collection to be your filenames i.e. a.html in the example,
o alter UnknownPlug to add the directory that each file appears in as metadata to the file in the collection.
If you look at your currently imported data in the archives directory of your collection, you will notice doc.xml files for each of your imported files.
inside these doc.xml files you will find something like
any metadata that begins with "gsdl" is used internally by greenstone and isnt available in the built collection for use, contrary to the way that dc.Title might be for example.
If this was available, you might be able to use it to accomplish the above, but as it isnt, you need to add your own SourceFileDirectory metadata or something similar.
When a plugin i.e. UnknownPlug deals with a file to be imported it has available the files path, you could add that data as metadata to each unique filename.
Actually, after discussion with colleagues the above solution might be tricky, but I'll leave it there as it might be useful anyway.
Here's another solution,
Write a perl program which builds an import directory from your existing import data.
This import directory will contain all of your unique files, and a metadata.xml file.
The metadata.xml file will contain an entry for each of the unique files and that files original directory paths in some metadata item called something like SourceFilePath.
In this way, you will import all your unique files and all their original paths as metadata.
So the entry for a.html will look something like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE GreenstoneDirectoryMetadata SYSTEM "http://greenstone.org/dtd/GreenstoneDirectoryMetadata/1.0/GreenstoneDirectoryMetadata.dtd">
<Metadata mode="accumulate" name="SourceFilePath">a</Metadata>
<Metadata mode="accumulate" name="SourceFilePath">b</Metadata>
<Metadata mode="accumulate" name="SourceFilePath">c/d/e/f</Metadata>
<Metadata mode="accumulate" name="SourceFilePath">g</Metadata>
-- DL Consulting Greenstone Digital Library and Digitisation Specialists firstname.lastname@example.org www.dlconsulting.com
H.M. Gladney wrote: