[Fwd: Re: [greenstone-users] RE: Problems rebuilding a collection (schild)]

From schild
DateWed, 23 Mar 2005 09:29:21 +0100
Subject [Fwd: Re: [greenstone-users] RE: Problems rebuilding a collection (schild)]
Hi Michael,

thanks for the reply, although the problem that I have experienced has
not directly been touched by this discussion yet! Contrary to Tran I
already set up the collection (meaning I did the user interface layout,
defined the classifiers that I want, the metadata for my documents,
etc...) This I did, as you also suggested, for just a small amount of
documents! Now what I want is to add more documents to this collection
using the design I set up AND using the gli. I do not care, contrary to
Tran, whether the a rebuild of the collection means that all documents
will be processed for indexing again (time is not a big constraint for
me at the moment). Besides that, I thought that mg (taken from other
postings on this list) is not capable of incremental indexing, thus
there is no way around processing all documents of a collection when
doing a build. Is this assumption right?
But anyway, I know that I can use the command line scripts for import
and building and I will probably try that, since I think the gli is not
working properly or not as I assumed it would work! I just wanted to
know (or let you know) whether this phenomenon that I experienced is a
bug in the gli or not. Let me clarify my intentions:

1. I have a small number of documents already properly imported, i.e. in
the archives folder but no longer in the import folder
2. I want to add more documents to the collection using the gli to
assign metadata to the docs and then rebuild
3. I *unchecked* the "removeold" option in the import options panel
(help text for removeold option: "Will remove the old contents of the
archives directory -- use with care."!). From what I read, it assume
that the old contents of the archives directory will be now reindex in
the building process and therefore will still be in the collection after
the rebuild!
4. I start the rebuild and look at the log file and see: "import.pl>
Removing current contents of the archives directory..."! How can that
be, I explicitly uncheck this option!

If the removeold option has another function, then the help text should
be adjusted to clarify the meaning of this option. Otherwise, if I am
not complete wrong, I would say that the gli is obviously not showing
the correct behaviour and this bug should to be corrected. That was the
intention I had in mind when writing my initial mail.

Thanks,

Axel

Michael Dewsnip schrieb:

> Hi,
>
> There are a lot of issues involved here. We find that the way the GLI
> rebuilds collections is rarely a problem -- you just have to
> understand what it is doing and remember a few simple tricks:
>
> - Get the design of your collection right *before* you add all your
> source documents. It's tempting to just add all the source documents
> and then play around with the collection until it is right, but it's
> much more efficient to sit down and think about the collection, get it
> working with just a few documents, then add the bulk of the documents
> afterwards.
> - Failing that, use the "maxdocs" option when testing changes to the
> collection design to rebuild the collection with just a subset of the
> documents.
> - As John pointed out, changes to format statements do not require
> re-importing or re-building. In the GLI, after changing the format
> statements go to the Create pane and click Preview Collection to
> immediately see the changes.
> - If you need finer control over the build process, run the import.pl
> and buildcol.pl scripts manually. As well as giving you much more
> control over what is done (eg. with buildcol.pl's -mode option), the
> scripts will run faster without GLI's overhead. The GLI is just a
> layer over the Greenstone scripts -- not a replacement for them. There
> will always be cases where it is more efficient to do things outside
> the GLI (or the only way to do it).
>
>> First at all, as Axel Schild's described: adding new files/removing
>> files would make GLI imports all old files again. On contrary as it
>> was described in user and developer manuals, this thing should not
>> happen - just new files should be processed for saving time and
>> allowing collection users accessing the old collection during the
>> building time.
>>
>>
> We have started to make the GLI smarter in terms of only importing and
> building when necessary (this is done for the GLI applet, where this
> is critical to reduce bandwidth usage), but this has to be perfect
> otherwise it is worse than useless.
>
> The old collection is always available to users while the collection
> is rebuilding, except for a very short period at the end when the old
> index directory is deleted and the new building directory is renamed
> to index.
>
>> Secondly, in my case: I don't add or remove any files from my
>> collection. I would like to change different options in the tabs
>> Design and Enrich of GLI for different kinds of output. For example,
>> I would like to hide the icon linking to the extracted texts as a
>> result I would have to change the Vlist.
>>
> Changing metadata in the Enrich pane or changing items in the Design
> pane requires re-importing and re-building. Changing format statements
> requires neither.
>
> Regards,
>
> Michael
>
>
>
>> -----Original Message-----
>> From: John R. McPherson [mailto:jrm21@cs.waikato.ac.nz] Sent:
>> Tuesday, March 22, 2005 12:00 PM
>> To: Tran
>> Cc: greenstone-users@list.scms.waikato.ac.nz
>> Subject: Re: [greenstone-users] RE: Problems rebuilding a collection
>> (schild)
>>
>> On Tue, Mar 22, 2005 at 10:23:15AM +0700, Tran wrote:
>>
>>
>>> Hi,
>>> I have a similar problem as Axel Schild has described. I want to
>>>
>>
>> rebuild my
>>
>>
>>> collection after I've made some minor changes related to how GS output
>>>
>>
>> would
>>
>>
>>> look (not add or change any document files in the collection). It
>>>
>>
>> seems that
>>
>>
>>> GS re-imports all my document files again and again and it takes a lot
>>>
>>
>> of
>>
>>
>>> time. In Expert mode I've been trying 3 different modes (build
>>> index, compress text and info) without any success.
>>>
>>
>>
>> If you just want to change the appearance (either by changing some of
>> the format statements in the collection's config file, or by
>> modifying some of greenstone's macro files), then you do not have to
>> rebuild or reindex the collection - these changes take effect
>> immediately.
>>
>> I don't know if you can change these from within the GLI without
>> rebuilding though - I'm not very familiar with it.
>>
>> John
>>
>> --
>> No virus found in this incoming message.
>> Checked by AVG Anti-Virus.
>> Version: 7.0.308 / Virus Database: 266.8.0 - Release Date: 3/21/2005
>>
>>
>>
>>
>
>
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users

--
----------------------------------------------

Dipl.-Ing. Axel Schild
Automatisierungstechnik und Prozessinformatik
Ruhr-Universitaet Bochum IC-3/151
D-44780 Bochum, Germany

Tel: +49 234 32 25203
Fax: +49 234 32 14101
E-mail: schild@atp.rub.de


--
----------------------------------------------

Dipl.-Ing. Axel Schild
Automatisierungstechnik und Prozessinformatik
Ruhr-Universitaet Bochum IC-3/151
D-44780 Bochum, Germany

Tel: +49 234 32 25203
Fax: +49 234 32 14101
E-mail: schild@atp.rub.de