Re: [greenstone-users] interrupting a warning

From John R. McPherson
DateSat, 13 Dec 2003 13:14:25 +1300
Subject Re: [greenstone-users] interrupting a warning
In-Reply-To (364688-1071257169318-JavaMail-cpadmin-pipeline)
On Fri, Dec 12, 2003 at 01:26:09PM -0600, Meg Miner wrote:
> Hi List,
> At the end of the step I get a list of 8 messages starting
> with WARNING: List::classify. The first build I did, I got 4 so I
> figure this problem is bound to continue growing.

You didn't say what the warning message was, so we can't really tell
what your problem is.

If it said "called multiple times for <some document ID>" then it
probably means that you have the same document imported more than
once in your archives. (This can happen for example if you use the
"groupsize" option to put more than one document in each archive file,
and then import the same document again into the archives).

If you still have all the source documents for your collection, you can
remove the archives directory and then re-import your collection to
remove any duplicates.

> My real problem with these messages is that they take a _lot_ of time
> to generate. After all files had been run through on buildcol (which
> happens pretty fast), it took over an hour from the time the first
> warning message popped up until I had a command prompt again.

I think you are misunderstanding what is happening. When you build the
collection, first every archived document needs to be loaded, then the
text is indexed and compressed, and then all the classify structures
need to be created, which involves grouping and sorting the
documents. If you have a large collection, these steps can take a long
time. The warning messages are given while the classifiers each decide
where in the structure each document goes - it tells you if it detects that
the same document ID is handled more than once, which might possibly
indicate a problem. It then still goes on to finish sorting and grouping,
which as I said can take a while, especially if you have lots of
documents and/or multiple classifiers.

> Question: Can I interrupt the warning message sequence without creating
> any additional problems?

You can interrupt the build process at anytime without causing any
problems to the stored archives, but the build will not be complete and
you will need to rebuild.

John McPherson