[greenstone-users] query parsing

From Tod Olson
DateSun, 28 Dec 2003 13:09:44 -0600
Subject [greenstone-users] query parsing
Question about the parsing of alphanumeric metadata and queries,
Greenstone version 2.40a, MacOS X, indexing via MG:

I'm fiddling with a collection that has a bunch of unusual
alphanumeric metadata, like "d3d7u10d2d8". A Metadata field might
have 500 or 1000 or so space-separated "words" of this nature. When
searching this metadata field for the string "d3d7u10d2d8" the results
say:

Word count: 2d8: 0, d3d7u10d: 0

So it seems this string is chopped up into two words during query
parsing. Why, and how can it be avoided?

This example was pulled randomly from actual data, and normally one
would search on several such "words". The user won't type them in
directly, but they will be generated from user input. Automatically
slapping quotes around the entire query string is probably not
desireable.

Any advice on avoiding this chopping up of the query string would be
welcome. Are there any options to control this? On the data side,
maybe the numeric information could be recast as alphabetic, if that
would make parsing more predictable. Would query parsing behavior be
different if the indexing were done with MG++? Any other ideas?


Tod A. Olson <tod@uchicago.edu> "How do you know I'm mad?" said Alice.
Sr. Programmer / Analyst "If you weren't mad, you wouldn't have
The University of Chicago Library come here," said the Cat.