Re: [Fwd: Re: [greenstone-users] RE: Problems rebuilding a collection (schild)]

From Katherine Don
DateWed, 30 Mar 2005 08:38:59 +1200
Subject Re: [Fwd: Re: [greenstone-users] RE: Problems rebuilding a collection (schild)]
In-Reply-To (424128E1-20606-atp-rub-de)
Hi Axel

The removeold/keepold options do not work in the GLI - currently the
archives are always deleted and recreated each time a build is done.
The GLI assumes that only the documents currently in your collection
(ie, documents in the import directory that you can see in the
Gather/Enrich panes) are meant to be included. If you have deleted them
from here then the GLI assumes that they are not meant to be in the
collection.

So the solution for you is to keep all your documents in the import
directory and rebuild with them all each time.
This does mean that they will be reimported at each build - if you are
not happy with that, you could use the GLI to assign metadata to the new
documents, save the collection, then run import.pl and buildcol.pl manually.

Regards,
Katherine Don

schild wrote:
> Hi Michael,
>
> thanks for the reply, although the problem that I have experienced has
> not directly been touched by this discussion yet! Contrary to Tran I
> already set up the collection (meaning I did the user interface layout,
> defined the classifiers that I want, the metadata for my documents,
> etc...) This I did, as you also suggested, for just a small amount of
> documents! Now what I want is to add more documents to this collection
> using the design I set up AND using the gli. I do not care, contrary to
> Tran, whether the a rebuild of the collection means that all documents
> will be processed for indexing again (time is not a big constraint for
> me at the moment). Besides that, I thought that mg (taken from other
> postings on this list) is not capable of incremental indexing, thus
> there is no way around processing all documents of a collection when
> doing a build. Is this assumption right?
> But anyway, I know that I can use the command line scripts for import
> and building and I will probably try that, since I think the gli is not
> working properly or not as I assumed it would work! I just wanted to
> know (or let you know) whether this phenomenon that I experienced is a
> bug in the gli or not. Let me clarify my intentions:
>
> 1. I have a small number of documents already properly imported, i.e. in
> the archives folder but no longer in the import folder
> 2. I want to add more documents to the collection using the gli to
> assign metadata to the docs and then rebuild
> 3. I *unchecked* the "removeold" option in the import options panel
> (help text for removeold option: "Will remove the old contents of the
> archives directory -- use with care."!). From what I read, it assume
> that the old contents of the archives directory will be now reindex in
> the building process and therefore will still be in the collection after
> the rebuild!
> 4. I start the rebuild and look at the log file and see: "import.pl>
> Removing current contents of the archives directory..."! How can that
> be, I explicitly uncheck this option!
>
> If the removeold option has another function, then the help text should
> be adjusted to clarify the meaning of this option. Otherwise, if I am
> not complete wrong, I would say that the gli is obviously not showing
> the correct behaviour and this bug should to be corrected. That was the
> intention I had in mind when writing my initial mail.
>
> Thanks,
>
> Axel
>
>
>
>
>
> Michael Dewsnip schrieb:
>
>> Hi,
>>
>> There are a lot of issues involved here. We find that the way the GLI
>> rebuilds collections is rarely a problem -- you just have to
>> understand what it is doing and remember a few simple tricks:
>>
>> - Get the design of your collection right *before* you add all your
>> source documents. It's tempting to just add all the source documents
>> and then play around with the collection until it is right, but it's
>> much more efficient to sit down and think about the collection, get it
>> working with just a few documents, then add the bulk of the documents
>> afterwards.
>> - Failing that, use the "maxdocs" option when testing changes to the
>> collection design to rebuild the collection with just a subset of the
>> documents.
>> - As John pointed out, changes to format statements do not require
>> re-importing or re-building. In the GLI, after changing the format
>> statements go to the Create pane and click Preview Collection to
>> immediately see the changes.
>> - If you need finer control over the build process, run the import.pl
>> and buildcol.pl scripts manually. As well as giving you much more
>> control over what is done (eg. with buildcol.pl's -mode option), the
>> scripts will run faster without GLI's overhead. The GLI is just a
>> layer over the Greenstone scripts -- not a replacement for them. There
>> will always be cases where it is more efficient to do things outside
>> the GLI (or the only way to do it).
>>
>>> First at all, as Axel Schild's described: adding new files/removing
>>> files would make GLI imports all old files again. On contrary as it
>>> was described in user and developer manuals, this thing should not
>>> happen - just new files should be processed for saving time and
>>> allowing collection users accessing the old collection during the
>>> building time.
>>>
>>>
>> We have started to make the GLI smarter in terms of only importing and
>> building when necessary (this is done for the GLI applet, where this
>> is critical to reduce bandwidth usage), but this has to be perfect
>> otherwise it is worse than useless.
>>
>> The old collection is always available to users while the collection
>> is rebuilding, except for a very short period at the end when the old
>> index directory is deleted and the new building directory is renamed
>> to index.
>>
>>> Secondly, in my case: I don't add or remove any files from my
>>> collection. I would like to change different options in the tabs
>>> Design and Enrich of GLI for different kinds of output. For example,
>>> I would like to hide the icon linking to the extracted texts as a
>>> result I would have to change the Vlist.
>>
>> Changing metadata in the Enrich pane or changing items in the Design
>> pane requires re-importing and re-building. Changing format statements
>> requires neither.
>>
>> Regards,
>>
>> Michael
>>
>>
>>
>>> -----Original Message-----
>>> From: John R. McPherson [mailto:jrm21@cs.waikato.ac.nz] Sent:
>>> Tuesday, March 22, 2005 12:00 PM
>>> To: Tran
>>> Cc: greenstone-users@list.scms.waikato.ac.nz
>>> Subject: Re: [greenstone-users] RE: Problems rebuilding a collection
>>> (schild)
>>>
>>> On Tue, Mar 22, 2005 at 10:23:15AM +0700, Tran wrote:
>>>
>>>
>>>> Hi,
>>>> I have a similar problem as Axel Schild has described. I want to
>>>>
>>>
>>>
>>> rebuild my
>>>
>>>
>>>> collection after I've made some minor changes related to how GS output
>>>>
>>>
>>>
>>> would
>>>
>>>
>>>> look (not add or change any document files in the collection). It
>>>>
>>>
>>>
>>> seems that
>>>
>>>
>>>> GS re-imports all my document files again and again and it takes a lot
>>>>
>>>
>>>
>>> of
>>>
>>>
>>>> time. In Expert mode I've been trying 3 different modes (build
>>>> index, compress text and info) without any success.
>>>>
>>>
>>>
>>>
>>> If you just want to change the appearance (either by changing some of
>>> the format statements in the collection's config file, or by
>>> modifying some of greenstone's macro files), then you do not have to
>>> rebuild or reindex the collection - these changes take effect
>>> immediately.
>>>
>>> I don't know if you can change these from within the GLI without
>>> rebuilding though - I'm not very familiar with it.
>>>
>>> John
>>>
>>> --
>>> No virus found in this incoming message.
>>> Checked by AVG Anti-Virus.
>>> Version: 7.0.308 / Virus Database: 266.8.0 - Release Date: 3/21/2005
>>>
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> greenstone-users mailing list
>> greenstone-users@list.scms.waikato.ac.nz
>> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>
>
>
>