Re: Re: start simple, use documents you already have, then figure out away to getwhat you need Re: [greenstone-users] How to write a source document?

From Jong Hann
DateWed, 29 Oct 2003 04:42:47 +0800
Subject Re: Re: start simple, use documents you already have, then figure out away to getwhat you need Re: [greenstone-users] How to write a source document?
Hi Stefan,

Thanks for taking the time to help out. You have thrown some...
interesting light on my original problems. I'm still working on them,
with some background info on digital libraries in general (from
Witten).

> You must have some content in mind for your library. If not, find
> some old word, PDF, or HTML files to use to get you started on
> building a demo library, just to get a feel for how things work.

I do have some content in mind. Like I said, I wrote the source
documents in text (and lately in HTML, and then in Texinfo), but the
Collector interprets them verbatim (no formatting, no processing, just
regurgitation).

> > Specifically, I need to be able to file documents under a
> > hierarchy (categories). Much like how the Greenstone demo does
> > it. Question is, how do I tell the Collector which category I want
> > which document to fall under? Also, I need documents to belong to
> > more than 1 category. A many (documents) to many (categories)
> > relationship. In short, I'm trying to find documentation on "how
> > to write source documents _in_a_format_ that would allow full
> > control over how Greenstone displays and organizes them.

> You don't need to do anything special to your source documents to
> get them to display in a hierarchical browse structure. The only
> time you may need to edit the source is if you wish to display your
> documents with their own table of contents. I'd suggest you not do
> that until you get everything else sorted.

Well, Greenstone's demo config does have a hierarchy (books are filed
into different "Organization" categories), so there's no question
about Greenstone's capabilities to solve my particular problems.

I was guessing that Greenstone's designers forgot to include source
documents in that demo (or example). But now I learn (from you) that I
don't have to edit the source to get that hierarchical structure.

I even tried to import source documents into Greenstone's demo config,
to see if I can get the Collector to build with the "Organization"
hierarchy (with my source, instead). The Collector simply says
something like "document(s) not found". I did a control experiment
with the default config. Documents _are_ found.

Now, the million dollar question (or confusion, rather). Yes, I meant
that literally. :-)

Fact is, you have stated that I don't need to do anything special to
my source documents to get Greenstone to display them in a
hierarchical browse structure. This fact is... rather puzzling (or
scary, if seen in another perspective explained below).

Suppose I want a sub-set `A' of documents to fall under category
`B_Cat', and a sub-set `B' of documents under `A_Cat'. (The A-to-B
counter-intuitive flip is deliberate to illustrate a point). How would
the Collector know to sort my documents thus, without my instruction?
:-)

Note, of course, there are many permutations of what I want (or
rather, what is possible) when it comes to filing the documents under
a variable set of categories.

Without going into dry mathematical proofs, I'm sure popular intuitive
logic will somehow detect ... "something scary" in your fact above.

Now, why is it scary? About the only way Greenstone can file my
documents into a hierarchy in precisely the way I want is... if
Greenstone can read my mind! :-) I thought Greenstone is advanced, but
I didn't figure it would be that much ahead of my time.

Sorry for going into the above elaboration. I thought my question was
simple enough, but somehow I get the feeling I'm not reading your
answers correctly.

Granted the answers to my simple question may not be trivial. I need
help, and you're halfway through helping me already. I'm hoping you
could help me just a little more. :-P

Please?

> I'd also suggest that you don't use the collector to build your
> collection. There is a new tool called the Greenstone Librarian
> Interface (GLI) that is much better for designing and building a
> collection. If you have gsdl-2.40a or newer then the GLI is
> included. For more details on how to use it go to
> http://www.nzdl.org/cgi-bin/library?a=p&p=about&c=gsarch and search
> for "GLI".

Now that's a new lead. But I can't find GLI on the forum messages
(from the URL above). Also, I can't run GLI. It gives me the following
error:

Exception in thread "main" java.lang.NoSuchMethodError
at org.greenstone.gatherer.gui.Splash.<init>(Splash.java:58)
at org.greenstone.gatherer.Gatherer.main(Gatherer.java:575)

> Unfortunately greenstone is not quite to the stage where you can
> download it and have it doing exactly what you want within minutes,
> then again, neither are a great many commercial software
> packages. There's a learning curve involved as with most software
> but most people find they can build a simple greenstone collection
> within an hour or two. I'll also grant you that the documentation
> isn't perfect, that's something we're working on. It _is_ free
> software however and we don't force anyone to use it.

Greenstone is great, and I know it (as is the case with many GPLed
software). It's like a read-only database-to-go (esp. on CD-ROM),
without the SQL work and a full-blown SQL engine. I was going to pay
$20,000 for something like this, no kidding.

But as it turns out now, it seems I still have to pay that amount (or
more) to get some pple to sort through the codes.

Before I plonk down some money to dig into the codes, I'm hoping I can
get some help here.

I understand there's a learning curve with any software. Question
is... can I fully learn Greenstone from documentation and manuals?

If not, I may have to learn without the docs and manuals. A form of
learning we usually call "reverse engineering" or "academic research"
or "scientific studies", etc. An expensive and risky endeavor that may
be less cost-effective than building a new Greenstone (a new tool that
my sponsors probably wouldn't GPL).

On other hand, if you help me explore Greenstone (learning with _your_
help, rather than with imaginary docs and manuals), I may be able to
contribute docs and manuals at end of day. I do have extensive
experience in writing docs and manuals (well-honed tech writing
skills).

> There are a lot of useful resources available on the web, including
> the FAQ's and documentation on greenstone.org and the demonstration
> collections on nzdl.org. You may just have to spend some time.

I already got a team to run through all the FAQ's and docs there is on
Greenstone. It took them merely 3 days (a divide and conquer approach,
first glance). Yes, admittedly the documentation really needs
work. :-)

In case you're wondering about my experience with digging into
open-source tools... I spent 2 months learning Emacs. I'm even
programming Emacs modes in eLisp after that. I did put in time for
Greenstone. Maybe I'm just not good enough to learn Greenstone within
2 months.

So, Please Help!

Thanks.

JH