RV: MGPP

From Juhé, Albert
DateFri, 24 Jan 2003 09:24:11 +0100
Subject RV: MGPP
Thanks for your reply Katherine Don,

We have finished the first release of mgpp, this version is without NEAR and
support: Multicollection, wildcards, and the operator >, >=, <, <=. All
developed in C++ using STL. The Querier that we have build accept SQL
language (a little version), doing queries versus diferent indexes at the
same time.

In the first Release we have this results:
SELECT text = "le*" 9,01 seconds and retrieve 363244 documents of 384512. In
a PII-200 with 48Mb.

If you are insteresed I can send us a copy of the project.

The second Release will be with NEAR, implemented like Managing Gigabytes
say, more and less.

We will buid another index file, where we store the offset of every word in
the document,(third level of index file).

The scheme that we have think is:

In the invert file we add a pointer, we store: number of document +
ptr_offset_file, number of document + ptr_offset_file,...

In the offset file we store:

length_to_read (buffer size) + offset1, offset2, ... - Applying the numbers
compresión used in mgpp Elias Gamma, delta,...

With this implementation, the exact queries are fast like mg, because if you
don't need NEAR, despise the pointer ptr_offset_file. Obviously the index
files are more large.

We have created a config file, for buildind collections, where you can
configure if you want NEAR, stem indexes, weight indexes,...

We think that this is a good implementation but we're sure. We will explain
the results of this.
Thanks for all, I have learned very much with yours project mg, and mgpp.