Re: [greenstone-users] Wildcard/truncation in searching?

From Katherine Don
DateFri, 28 Apr 2006 16:30:08 +1200
Subject Re: [greenstone-users] Wildcard/truncation in searching?
In-Reply-To (52B914D66FE6974A952ECCACE81B3642A100F1-libmail1-libad-csus-edu)
Hi Bin Zhang

Are your source documents segmented (i.e. spaces between the words)? If
not, Greenstone can't tell where words start and finish, so whole
sentences will be indexed as single words. (Whitespace and punctuation
are used as word boundaries.) This may be why you need to enter an
entire title to get a match.

If you can segment your documents before giving them to Greenstone, then
you can search using individual words.
Alternately, you can specify an option in your collect.cfg file to split
the input into characters.

The option needs to be added by hand as it is not part of GLI yet:
separate_cjk true
You'll need to reimport and build the collection.
Now you should be able to search using individual characters.
Note that the search terms also get split into individual characters
before the search is carried out.
This means that you may get more hits than you expect (but its better
than no hits).

If you have built your collection with MGPP or Lucene (enabled advanced
searching) then you can use *, e.g comput* will find any words that
start with comput.
This may or may not help you.


Bin Zhang wrote:
> Does Greenstone support trunction? I have a Chinese book collection, found that I have to enter the entire title or author name to find a book. I tried to use "*" or "?", but neither one worked.
> Thanks
> --------------------------
> Bin Zhang, Digital Information Services Librarian
> Library Information Systems
> California State University, Sacramento
> 2000 State University Drive, East, Sacramento, CA 95819-6039
> +1 (916) 278-5664 (office); +1 (916) 278-3891 (fax)
> ------------------------------------------------------------------------
> _______________________________________________
> greenstone-users mailing list