Re: [greenstone-users]

From Michael Dewsnip
DateFri, 22 Aug 2003 09:40:20 +1200
Subject Re: [greenstone-users]
In-Reply-To (Pine-LNX-4-33-0308191238330-8912-100000-mmsl-serc-iisc-ernet-in)
Hello Anandh,

When Greenstone indexes the documents in a collection, it hashes them
based on their content to calculate an ID. Therefore, two files that are
exactly the same will only appear once in the final collection, because
they will have the same ID. If the two files are even slightly different
(maybe because they contain absolute links) then they will both appear in
the collection (and therefore, in the search results).

If the files in the mirror site are exactly the same as in the original,
then they will be ignored, otherwise they will be indexed too.



Anandh Jayaraman wrote:

> Does the Greenstone remove duplicates (slightly different URL , but the
> same content) from the search results?
> How does it treat MIRROR SITES?
> --
> Regards
> Anandh
> _______________________________________________
> greenstone-users mailing list