Re: metadata only collections and formatting strings

From Stephen.DeGabrielle@ntu.edu.au
DateThu, 19 Dec 2002 16:47:53 +0930
Subject Re: metadata only collections and formatting strings
Thank you (_both_) for your help. I think I am having progress in skipping
the import stage. (I am looking at splitplug and its 'children?' Referplug
and the one for bibtextPlug - I am still learning perl; but was easier for
me to write the convertor in AWK).

This is what I have got to so far;
I have created single doc.xml file (with a subsest of 5 records),

It is only getting the last record; and complains
"WARNING: AZList::classify called multiple times for NULL" at the end of
the build, in addition to

I think I am getting there- how do you induce the file to split?

See
http://aradadesktop/gsdl?site=localhost&a=p&p=about&c=intanmas&ct=0&l=en&w=iso-8859-1


--
Stephen De Gabrielle
Digitisation Officer
AraDA Project
NTU Library
+61 8 8946 7009
http://www.ntu.edu.au/library

hi,

what about if you bypassed the import stage altogether - instead of
translating
your metadata into metadata.xml files, translate it into archive files -
the
doc.xml files. You can group several 'documents' into one file - we do this
with bibliographic collections. Then you need to create the archives.inf
file
as well.
Currently all documents need some text in the Content - you can just put a
dummy string in there

eg each entry would look like the following

<Section>
<Description>
<Metadata name="xxx">yyy</Metadata>
....
</Description>
<Content>
dummy text
</Content>
</Section>

cheers,
Katherine


Stefan Boddie wrote:

> Hi,
>
> >
> > Does anyone know if it is possible to use greenstone for metadata only.
> > I have no source documents - I will transtlate my metadata into the
> > metadata.xml format for recplug to process.
> >
> > If I do metadata.xml without a filename it fails to add any records - I
> > suppose I shoudn't be surprised.
> >
> > I am trying to work out the correct solution, I can solve this with a
> > dummy document with non-word data - but I feel that is a waste of
> > space and the incorrect solution. (I have tried searching the list and
> > the documentation/source but to no avail)
> >
>
> There's no easy way to do this I don't think. Collections containing
nothing
> but metadata.xml files are something that should be possible though (I'll
> add it to the list :-)
>
> You might be able to do something tricky with the SplitPlug so that you
only
> need one dummy file instead of one for each document. It'd probably
involve
> a bit of perl programming though.
>
> >

--
C:Program Filesgsdlcollectintanmasbackup>perl -S buildcol.pl intanmas

*** creating the compressed text

collecting text statistics
ArcPlug: processing C:Program
Filesgsdlcollectintanmasarchivesarchives.inf

GAPLug: processing HASH6e75.dirdoc.xml
Stats (Compressing text from section:text)
Total bytes in collection: 32
Total bytes in section:text: 32
***************
WARNING: There is very little or no text to compress
Was this your intention?
***************

creating the compression dictionary

compressing the text
ArcPlug: processing C:Program
Filesgsdlcollectintanmasarchivesarchives.inf

GAPLug: processing HASH6e75.dirdoc.xml
Stats (Compressing text from section:text)
Total bytes in collection: 32
Total bytes in section:text: 32
***************
WARNING: There is very little or no text to compress
Was this your intention?
***************

*** building index document:Title,Creator in subdirectory dtc

creating index dictionary
ArcPlug: processing C:Program
Filesgsdlcollectintanmasarchivesarchives.inf

GAPLug: processing HASH6e75.dirdoc.xml
Stats (Creating index document:Title,Creator)
Total bytes in collection: 32
Total bytes in document:Title,Creator: 297

inverting the text
ArcPlug: processing C:Program
Filesgsdlcollectintanmasarchivesarchives.inf

GAPLug: processing HASH6e75.dirdoc.xml
Stats (Creating index document:Title,Creator)
Total bytes in collection: 32
Total bytes in document:Title,Creator: 297
ivf.pass2 : M

create the weights file
.
L = 2.593519
U = 5.799285
B = 1.003148

creating 'on-disk' stemmed dictionary

creating stem indexes

*** building index document:Title in subdirectory dtt

creating index dictionary
ArcPlug: processing C:Program
Filesgsdlcollectintanmasarchivesarchives.inf

GAPLug: processing HASH6e75.dirdoc.xml
Stats (Creating index document:Title)
Total bytes in collection: 32
Total bytes in document:Title: 231

inverting the text
ArcPlug: processing C:Program
Filesgsdlcollectintanmasarchivesarchives.inf

GAPLug: processing HASH6e75.dirdoc.xml
Stats (Creating index document:Title)
Total bytes in collection: 32
Total bytes in document:Title: 231
ivf.pass2 : M

create the weights file
.
L = 1.697857
U = 5.046188
B = 1.004264

creating 'on-disk' stemmed dictionary

creating stem indexes

*** building index document:Creator in subdirectory dcr

creating index dictionary
ArcPlug: processing C:Program
Filesgsdlcollectintanmasarchivesarchives.inf

GAPLug: processing HASH6e75.dirdoc.xml
Stats (Creating index document:Creator)
Total bytes in collection: 32
Total bytes in document:Creator: 66

inverting the text
ArcPlug: processing C:Program
Filesgsdlcollectintanmasarchivesarchives.inf

GAPLug: processing HASH6e75.dirdoc.xml
Stats (Creating index document:Creator)
Total bytes in collection: 32
Total bytes in document:Creator: 66
ivf.pass2 : M

create the weights file
.
L = 1.960516
U = 3.099849
B = 1.001791

creating 'on-disk' stemmed dictionary

creating stem indexes

*** creating the info database and processing associated files
ArcPlug: processing C:Program
Filesgsdlcollectintanmasarchivesarchives.inf

GAPLug: processing HASH6e75.dirdoc.xml
WARNING: AZList::classify called multiple times for NULL
WARNING: AZList::classify called multiple times for NULL
WARNING: AZList::classify called multiple times for NULL
WARNING: AZList::classify called multiple times for NULL
WARNING: AZList::classify called multiple times for NULL
WARNING: AZList::classify called multiple times for NULL

*** creating auxiliary files

C:Program Filesgsdlcollectintanmasbackup>
---
creator stephen.degabrielle&#64;ntu.edu.au
maintainer stephen.degabrielle@ntu.edu.au
public true

indexes document:Title,Creator document:Title document:Creator
defaultindex document:Title,Creator

plugin ZIPPlug
plugin GAPlug
plugin TEXTPlug
plugin HTMLPlug
plugin EMAILPlug
plugin PDFPlug
plugin RTFPlug
plugin WordPlug
plugin PSPlug
plugin ArcPlug
plugin RecPlug

# RecordNumber
# Title
# Creator
# Publisher
# PublishedDate
# SourceCreator
# SourceTitle
# SourcePubDetails
# SourcePublishedDate
# SourcePublishedPlace
# Location
# Keywords
# Language
# TypeDesc

classify AZList -metadata Title
classify AZList -metadata Creator

format SearchVList "<td>[link][Title][/link]</td> <td>by [Creator]</td>"

format CL1VList "<td>[link][Title][/link]</td> <td>by [Creator]</td>"
format CL2VList "<td>[link][Title][/link]</td> <td>by [Creator]</td>"

format DocumentText '<div align="left"><strong>Title:
[Title]</h2><p>[Creator]<br>[Publisher]<br>Record
number:[RecordNumber]</p></div>'

collectionmeta collectionname "Intanmas"
collectionmeta iconcollection ""
collectionmeta collectionextra "Demonstration for Intanmas."

collectionmeta .document:Title,Creator "all"
collectionmeta .document:Title "titles"
collectionmeta .document:Creator "Author"
--
--
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE Archive SYSTEM
"http://greenstone.org/dtd/Archive/1.0/Archive.dtd">
<Archive>
<Section>
<Description>
<Metadata name="RecordNumber">1</Metadata>
<Metadata name="Title">Moluks Maleis voor Nederlanders: kursussen
Moluks Maleis voor onderwysgevenden en werkers in de zorgsektor
</Metadata>
<Metadata name="Creator" mode="accumulate">Xaf Lasomer and Rouke
Broersma</Metadata>
<Metadata name="Publisher">Utrecht: Pusat Edukasi
Molukkers</Metadata>
<Metadata name="PublishedDate">1994</Metadata>
<Metadata name="Keywords" mode="accumulate">Maluku ; Moluccans ;
Netherlands ; Ethnic groups ; Language ; Education ; Moluccan Malay
</Metadata>
<Metadata name="Language">English</Metadata>
<Metadata name="TypeDesc" mode="accumulate">Book</Metadata>
</Description>
<Content>--- ---
</Content>
</Section>
<Section>
<Description>
<Metadata name="RecordNumber">2</Metadata>
<Metadata name="Title">Studieschrift Moluks Maleis </Metadata>
<Metadata name="Creator" mode="accumulate">Tahitu, Bert</Metadata>
<Metadata name="Publisher">Utrecht: Landelyk Steunpunt Edukatie
Molukkers</Metadata>
<Metadata name="PublishedDate">1993</Metadata>
<Metadata name="Keywords" mode="accumulate">Maluku ; Moluccans ;
Netherlands ; Ethnic groups ; Language ; Moluccan Malay </Metadata>
<Metadata name="Language">English</Metadata>
<Metadata name="TypeDesc" mode="accumulate">Book</Metadata>
</Description>
<Content>--- ---
</Content>
</Section>
<Section>
<Description>
<Metadata name="RecordNumber">3</Metadata>
<Metadata name="Title">Stories in animism and Christian pneumatology
</Metadata>
<Metadata name="Creator" mode="accumulate">Haire, James</Metadata>
<Metadata name="SourceTitle">Asia Journal of Theology</Metadata>
<Metadata name="SourcePubDetails">5(2) p.397-409</Metadata>
<Metadata name="SourcePublishedPlace">Oct 1991</Metadata>
<Metadata name="Keywords" mode="accumulate">Missions ; Religion ;
Theology ; Christianity ; Mythology ; Maluku ; Animism ; Gikiri Moi ;
Culture </Metadata>
<Metadata name="Language">English</Metadata>
<Metadata name="TypeDesc" mode="accumulate">Journal
article</Metadata>
</Description>
<Content>--- ---
</Content>
</Section>
<Section>
<Description>
<Metadata name="RecordNumber">4</Metadata>
<Metadata name="Title">West Papua: Forgotten war, unwanted people
</Metadata>
<Metadata name="Creator" mode="accumulate">Sands, Susan</Metadata>
<Metadata name="SourceTitle">Cultural Survival Quarterly</Metadata>
<Metadata name="SourcePubDetails">15(2) p.40-44</Metadata>
<Metadata name="SourcePublishedPlace">1991</Metadata>
<Metadata name="Keywords" mode="accumulate">Irian Jaya ; Social
conditions ; Refugees ; Indonesia ; Politics and government </Metadata>
<Metadata name="Language">English</Metadata>
<Metadata name="TypeDesc" mode="accumulate">Journal
article</Metadata>
</Description>
<Content>--- ---
</Content>
</Section>
</Archive>