Re: building problems-multiple document entries per list

From Gordon Paynter
DateWed, 14 Aug 2002 08:54:03 -0700
Subject Re: building problems-multiple document entries per list
In-Reply-To (001201c242cd$644b98c0$0801febf-tlclinux-org)
On Tuesday 13 August 2002 06:29 am, desiree' simon wrote:

> Is there way to lower the A-Z minimum number of document threshold?
> You see my plan is to build a collection, gradually adding document
> as they become available. Does adding 2 documents to a collection of
> 39 document trigger
> the A-Z format or will I have to rebuild the entire collection.

You will have to rebuild the collection each time you add new
documents. When you rebuild, the AZList will be recreated from
scratch and the divisions will be created.

There is no way to change the minimum number of documents threshold
for an AZList classifier. The AZCompactList has a few more options,
but they still may not do what you want (At the command-line: you can
see a classifiers options by running "classinfo.pl AZList" or
"classinfo.pl AZCompactList").

Regarding your problem with mulitple versions of the the same
documents:

1. Are you sure the "archives" directory was emptied before you ran
the import program. If you import a collection, change some
documents, then import again, then there is a chance that both
versions of the same documents will still be sitting in the
collection's archives directory, and will then both be included in
the collection. (At the command-line you can prevent this by
giving the import command the "-removeold" argument, I don't know
if there's an equivalent in the collector, but you can instead just
delete the contents of the archives directory manually.)

2. If this doesn't help, you'll need to do a little detective work.
First, find the document identifiers of two duplicated documents
(i.e. the [Identifier] metadata; you can find it by viewing a
document in the web browser and lookingg at the URL, which should
contain a=d&d=[Identifer]. The [Identifier] will probably be in
the form HASHxxxxxxxxxx of Dxxx.) Second, find the XML file
corresponding to each of the two documents in the archives
directory. It will be in a subdirectory whose name is based on the
[Identifier] in a file called doc.xml. Third, look in the doc.xml
file for each version of the document and see what the "source"
file was that the XML is based on. This will tell you where the
two version of the documents are imported from.

Hope that helps, and is comprehensible.
Gordon