If you're talking about just the inverted index file then this is not
enough to recreate the source documents. You can determine which words
appear in which document, but you cannot determine the correct order of
If you're talking about a complete Greenstone collection index then you
can recreate the source documents from the compressed text (this is what
Greenstone does when showing a document).
The best reference about MG is of course the Managing Gigabytes book, by
Ian H. Witten, Alistair Moffat and Tim Bell.
> One of our publishers will potentially be concerned with security
> issues of MG indexing. Specifically, we want to reassure them of the
> difficulty a hacker would have recreating the source docs from the
> inverted index db file. Do you know if a paper exists on this topic.
> If so, could you point me to it?
> Lisa Weil
> Global Development and Environment Institute
> Tufts University
> greenstone-users mailing list