[greenstone-users] Using customized DTD for import, user access levels (was: PDF collection with bibliographical metadata)

From Birgit Kellner
DateFri, 14 Jan 2005 21:17:41 +0100
Subject [greenstone-users] Using customized DTD for import, user access levels (was: PDF collection with bibliographical metadata)
In-Reply-To (41E1B770-6040604-cs-waikato-ac-nz)
Hi Katherine,

thank you for the helpful advice. I solved the issues in question, and
can now move on to new ones :-)

Katherine Don wrote:

> Hi Birgit
> Classifier problems: If Greenstone thinks a document is in English (or
> it can't determine what language its in it defaults to English) then
> when formatting the metadata for sorting it removes any characters
> that are not a-z0-9. So for japanese metadata, it will become empty
> and therefore the document will not be part of the classification.
> Try adding "-default_language ja" option to UnknownPlug. All metadata
> will be assumed to be in Japanese, and no formatting will be done. -
> this will probably stuff up the Author classification though.
> Anyway, try it and see what happens.

Thank you. I added the option; sorting on "Author" works ok with
transliterations of Japanese author names, and as these always exist in
the database, and as non-Japanese names are in there as well, this is
all I currently need. The article title azlist (article titles may be in
Japanese, German, Italian, English, French ...) throws in the Japanese
entries in unexpected places, but my users and me can live with this,
and I can't imagine how this could be changed anyway (except if we also
provided transliterated titles for all the Japanese entries, which is
too much work). Anyway, I consider these classifier problems solved for
the current collection - thanks again.

> Another alternative is to use a new classifier developed by Michael
> for non-English metadata. It's at
> http://www.cs.waikato.ac.nz/~mdewsnip/greenstone/GenericList.pm.zip
> Download and unzip into your gsdl/perllib/classify directory.
> Then use GenericList inplace of AZList for Japanese metadata. This
> should hopefully use Japanese sort order. If you do use this and you
> have success/problems, please let us know as this is still under
> development.

I haven't tried it yet, but will let you know if eventually I do.

> SearchForm problems:
> I suspect that perhaps you haven't changed your collect.cfg file
> properly when using mgpp? Search forms are only available with mgpp,
> not with mg. teh document at
> http://www.greenstone.org/docs/mgpp_user.pdf gives details about usign
> mgpp, alternatively you can use the Librarian Interface, and turn
> advanced searching on.
I had activated mgpp before, and consulted the manual; I'll try this
once again and let you know if I have problems.

The Librarian Interface is - it seems - not an option for this
collection because it uses a customized field format in metadata.xml,
and if I understand things correctly, the Librarian Interface requires
consistency with the Greenstone DTD.

This is actually another problem: I understand that if I don't use the
Librarian Interface, I can't let people add data to collections
remotely, or can I?
I would need some possibility for people to do this: access a website,
login, enter metadata through a form, upload a file name, and then
import the file into a specific collection.
It seems possible to write a cgi-script that (a) handles login, (b)
offers a form for entry of metadata, (c) handles file upload into the
import directory, and (d) calls import.pl and buildcol.pl to carry out
import processes, but security-wise this entails that the import
directory would have to be world-writable, and presumably also has
adverse effects regarding permissions of import.pl and buildcol.pl.

Is there really no way to use the Librarian Interface with a customized
metadata.xml format? Can't I write my own DTD and have Greenstone use it?
(Oh, and ideally certain users should be allowed only to add data to an
existing collection, and NOT to import with --removeold.)

The second big issue now is user access to the collection itself. The
user manual mentions that it is in principle possible to restrict user
access to a collection on a file-to-file basis, but I couldn't find any
further specifics on how to do this.

At any rate, here is what we plan to do: some files in the collection
should be public to everyone, whereas others should be available only to
a certain group of users that ideally should be definable (not
"administrators" or "colbuilders", but something like "institute
members"). Of course, non-public files should also not be included in
search results performed by users that have not logged into the system
or have no "institute" privileges. Is this doable?

Best regards, and thanks again, and again, and again ...

Birgit Kellner