Re: SEMI-STRUCTURED (FIELD) INDEXING

From Sukhdev Singh
DateFri, 10 Jan 2003 15:29:19 +0500
Subject Re: SEMI-STRUCTURED (FIELD) INDEXING
In-Reply-To (3E1E8B51-4E37CA17-up-ac-za)
It seems you are trying to publish a bibliographic database in greenstone.

You will need to write a plugin that will that will understand stand the
structure of your file for each "record". You may base your plugin
on ReferPlug.pm / BibTexPlug.pm which are distributed with greenstone.

I had experimented with publishing bibliographic database few days back.
The experiment was basically to publish a database in ISIS (WINISIS ;
CDS/ISIS) with the help of Greenstone.

I have put up a brief write up regarding it at the following link:
http://www.geocities.com/esukhdev/PublishISIS.htm

The product of the experiment is available at :

http://164.100.9.16/gsdl/cgi/library.exe


regards


Sukhdev Singh

Principal Systems Analyst
Indian Medlars Centre
(Bibliographic Informatics Division)
National Informatics Centre,
A-Block, CGO Complex, Lodhi Road,
New Delhi - 110 003,
Telephone: 11-4362359
(http://indmed.nic.in)


At 10:58 AM 1/10/03 +0200, you wrote:

>We exported some 20.000 documents (records) from DB/TextWorks (Inmagic,
>Inc - USA) in tagged format, i.e. with delimiters to indicate each
>field,
>paragraphs and wraparounds. (This is in ASCII and all in one big file).
>
>For instance the Author field has a tag AU followed by a blank space and
>
>then the names of the authors, the Title field has a tag TI also
>followed by
>a space and then the document's title, wraparound is indicated by a
>blank
>space on the next line followed by the rest of the text, a new paragraph
>or
>forced new line is indicated by a ";" (semicolon) followed by a blank
>and
>then the text, etc. Each document is ended by a "$" (Dollar) sign and
>immediately the next document begins. All 20.00 documents are thus in
>one
>big file directly following one another. All of the fields are not
>present
>in all of the documents. Some documents may have descriptors, others
>not,
>etc. All of the (same) fields in all of the docs are not equal in
>length.
>All of the different fields are not equal in length as well.
>
> We wish to import this into Greenstone, with the documents separated.
>So
>Greenstone should tell me I have 20.000 docs or records or titles. I
>thought of doing this using the Organizer, but was told that the
>Organizer would
>not work for this.
>
> (For this exercise, I use Greenstone on my PC, a HP KAYAK XA.)
>
> 1. Is this feasible at all?
>
> 2. How do I tell the Greenstone that each document ends with the "$"
>sign
>and the next line in the file is the number (or what the case may be) of
>the
>next document?
>
> 3. If this is feasible, is there a method to maybe later on export the
>
>new collection in ASCII format (in the same way as with DB/TextWorks)?
>
> I know the Collector, but have no experience with the Organizer.
>
> I will appreciate any help or tips. I've read the FAQ, but I don't
>find
>my specific problem in there. I also have the User's and Developer's
>Guides, but haven't found anything in there to help me, my I just don't
>see
>it...
>
> Thanx
> David Fourie
>
>