Re: changing greenstone

From Stefan Boddie
DateTue, 08 Apr 2003 20:31:54 +1200
Subject Re: changing greenstone
In-Reply-To (OFDF73CE75-FC2B53CE-ON69256D02-000E7C97-69256D02-000E7CE3-ntu-edu-au)
> That is an excellent plan
>
> How about using UnknownPlug for dealing with unknown
> documents types and new plugins?
>
> plugin UnknownPlug -process_exp ".pdf$" -converter
> "pdftohtml" -pass_arguments_to_convertor "-noframes -p
> ..."
>
> I feel a seperate '-pass_arguments_to_convertor' flag for
> all plugins [including UnknownPlug] for those cases when
> you want additional arguments passed to your convertor.
>

It wouldn't make sense for all plugins, only those derived
from ConvertToPlug (that is, those that use conversion
utilities to convert a given format to html or text for
processing). This currently includes PDFPlug, WordPLug, and
PSPlug. This makes me think perhaps it should be
ConvertToPlug rather than UnknownPlug that has the new
functionality added.

That is, at present ConvertToPlug can't be used directly.
It's like BasPlug in that it's only ever used as a base
class from which other "usable" plugins are derived. Perhaps
we could change this to allow ConvertToPlug to be used
directly, with the new -converter and
-pass_arguments_to_converter options to specify how the
conversion is done.

> Other 'convertors' that I would like to use from within
> GSDL could include;
>
> * an XSLT processor - so I could customise how I
> harvested TEI and DOCBOOK XML formats, harvest [and
> search] RSS feeds from news services, blogs and
> yahoo-groups, or indeed any XML ML.
>
> * jocr/gocr OSS OCR software?
>
>
>
> Please let me know what you think.

All sounds cool.

Unfortunately I don't have time to implement this right now
though. Anyone else feel ike having a go at it?

>
>
>
> BTW Can you direct me to a good source of information
> about the gdbm? I am looking at the included doc's but
> anything else you can suggest would be appreciated.
>

It's old, ugly, not very portable, and I don't much like it.
How's that for documentation? Seriously though, the unix man
page is all I've ever used. Try searching the web if you
need more.

> Also - what is your opinion of the 'Managing Gigabytes'
> book?
>

It's WONDERFUL (I have to say that because my boss wrote it :-)


cheers,
Stefan.

> Regards,
>
> s.
>
>
>
> _________________________________________________ Stephen
> De Gabrielle Digitisation Officer AraDA Project
>
> Northern Territory University Library
> http://www.ntu.edu.au/library Tel: (08) 8946 7009 from
> overseas: 61 8 8946 7009 Postal address: P.O.Box 41246,
> Casuarina, NT, 0811, Australia CRICOS Provider No: 00300K
>
>
>
> Stefan Boddie <sjboddie@cs.waikato.ac.nz> Sent by:
> owner-greenstone-devl@colosys.net 08/04/2003 01:18 PM
> ZE12
>
> To: Stephen.DeGabrielle@ntu.edu.au cc:
> greenstone_devl@colosys.net bcc: Subject: Re: changing
> greenstone
>
>
> Stephen.DeGabrielle@ntu.edu.au wrote:
>> I am thinking of an '-arguments' switch that passes a
>> quoted list of arguments to the converter
>>
>> I feel this would be good because
>>
>> - this would deal with all situations in this case
>>
>> - not hide what is going on to the collection builder
>>
>> - make for more informative 'plugin' lines in
>> collect.cfg files.
>>
>> s.
>>
>>
>
> I like this idea. Perhaps a new plugin or maybe an
> extension to ConvertToPlug (or maybe to HTMLPlug?)
> allowing you to use a plugin line something like the
> following:
>
> plugin ConvertPlug -process_exp ".pdf$" -converter
> "pdftohtml -noframes -p ..."
>
> It'd need options to deal with the different ways
> converters receive input and produce output. Shouldn't be
> difficult though I don't think.
>
> Stefan.
>