Re: [greenstone-users] Phrase search oddities?

From Katherine Don
DateTue, 13 Dec 2005 17:29:53 +1300
Subject Re: [greenstone-users] Phrase search oddities?
In-Reply-To (Pine-WNT-4-61-0512090957260-4012-LIBSTFSYS11-LIBRARY-UIUC-EDU)
Hi Jon

Yes, our phrase searching is not the best is it?
Due to the way the default indexer (MG) works, to do a phrase search we
have to do a boolean (AND) query, then scan throught the text of the
matches to look for a phrase.
I don't know why "foo" "foo" doesn't work.

Anyway, I suggest you try using MGPP as the indexer instead. This does
phrase searching properly.
You can do all and some queries with phrases.

Are you using GLI to build your collection? Set enabled advanced
searches to true.

Otherwise, in your collect.cfg file, add the following lines
buildtype mgpp
searchtype plain form

indexes line needs to change to e.g.
indexes allfields text Title
(i.e. no level info)
level info goes into a line like
levels document section

Then rebuild.

You'll have to play around with it a bit and see what phrase queries work.
Just now I found that searching allfields works fine, but doing a some
query with a phrase inside a field (e.g. text or Title) doesn't work (an
all query is fine though). This is due to teh way greenstone is
generating the mgpp query from the search form its getting it a bit wrong.
This should be not too difficult to fix.
have a play with the form and plain interfaces (change using
preferences), and decide which interface you like and which bits don't
work, and I can help you fix it.

If you have customised your own interface, and can modify the query,
then you can use teh correct syntax, and it will work fine. I can let
you know what args to use if you are doing it this way.

There is also an extended search syntax, see the mgpp user guide
also recently added is
comput* - matches everything starting with comput

I hope this wasn't too confusing. Rebuild the collection using mgpp, try
it out, see if you like it, then I can help you iron out any bugs.


Jonathan Gorman wrote:
> Hi all,
> Sorry to pester you all again, but I've noticed some strange behavior
> when trying to do "phrase" searching.
> First, say I search for "foo" "bar". If I select some words, it'll
> still and these together. That is, I'll only get words with foo and bar
> in them. I realize this might be in part because the query is passed
> in, the documents gathered, and then filtered. But it goes against the
> user expectations. If they're search for some words, and do "cream of
> corn" "cream of mushroom" and choose some words, it might be because
> they have those ingredients and want to know any casseroles that have
> either of them in.
> Also, searching "foo" "foo" causes it to never match. The chances of
> this happening with how our particular library is somewhat likely.
> In the second case I can do some JavaScript magic I believe with a
> little hacking of the macros. My fear is the only way around the first
> case is to dig in deeper to how the filter system works. I'm also
> nervous that it'll take a large amount of tweaking to fix this.
> On a related note, are there any nice docs on what search options are
> available? I'm assuming just whole words, but is there any more tricks.
> I haven't played around with it much, but I know there isn't a huge
> amount of functionality.
> (If I can get this to work somewhat reliably, I can map "some words" to
> a crude "What I have in my pantry", and "all words" to "I want to use
> something that uses all of these".
> Jon Gorman
> _______________________________________________
> greenstone-users mailing list