Re: [greenstone-devel] Some concerns about MGPP

From Katherine Don
DateMon, 25 Aug 2003 11:00:23 +1200
Subject Re: [greenstone-devel] Some concerns about MGPP
In-Reply-To (1061580750-3f466fce94f5e-webmail-utoronto-ca)
Hi Kim,

> I have a few questions about MGPP searching engine used in Greenstone:
> 1. I notice that MGPP handle the punctuation marks for the query quite
> unexpectedly. For instance, if I would like to search for the term "IBM.COM"
> The search engine would only take the first word IBM and truncate the rest of
> it. If I typed "IBM,COM", then MGPP would strip off that comma, and search
> for 2 independent terms: IBM, COM. So what's wrong with the period? Why
> can't MGPP handle it? and which file should I try to modify to solve this
> dilemma?

there is currently a bug in mgpp where if it comes across an unknown character in
the query string (both . and , are unknown) then it just stops parsing the query
and just works with what it has so far.
So both IBM.COM and IBM,COM would result in an mgpp query of IBM being performed.
This is what you would get on the plain text search.
In the form search, IBM,COM works only because we use javascript to format the
query, replacing space and comma with a + before adding the query string to the
url and submitting the form. so mgpp actually receives IBM COM as its query.
I have been meaning to fix mgpp for quite a while so that it just ignores unknown
tokens in the query rather than quitting the parsing. I will move that up my
todo list, and it should be fixed in the next release.

In the meantime, you could edit the javascript code in theres a
function format(string) {

just change the line

if (ch == " "|| ch == ",") {


if (ch == " "|| ch == "," || ch == ".") {

and that will get rid of any periods in the query strings. You can add any other
characters here if you like.

> 2. I learnt from the MGPP manual provided on the greenstone's website that
> we could specify the weight for the query terms to affect the ranking of the
> query. So I just would like to confirm that things work correctly if we just
> specify the weight with this convention: "query"/"the weight" without
> modifying any of those source codes of MGPP.

this is correct. a query like

snail/10 farming

would rate snail as having a 10 fold importance to farming in the query.

Katherine Don