Re: [greenstone-users] Information, please

From Katherine Don
DateWed, 02 Feb 2005 10:43:49 +1300
Subject Re: [greenstone-users] Information, please
In-Reply-To (s1f8bab2-047-mail2-juta-co-za)
Hi Amanda

Here is some information that may help you.

There is some information about collection size limits at
http://www.greenstone.org/cgi-bin/library?e=p-en-faq-utfZz-8&a=p&p=faqbuild#sizelimit

We support cross-collection searching so you can have eg. one collection
per product, and offer the option of searching one, several or all
collections at once.

In the standard setup, we can search at document/section/paragraph
level, and retrieve at document/section level. One of our search engines
does word level indexing. This provides phrase searching, but not
retrieval of word positions.

However, you can customise the system further. For example, one of our
collections uses a sentence level index
(http://www.nzdl.org/cgi-bin/howto/library?a=p&p=about&c=howto&nw=utf-8)

Searching is pretty quick, but I am not sure how fast it would be with
30GB of text.

Searching just looks for the search terms in the document/sections. So a
query like "what do I need to know to build a house" would search for
each word in that sentence. We don't do any processing of the query. A
more suitable query would be "legal requirements house building" or
something like that.

What content is indexed and therefore searchable is up to the collection
designer. We provide full text and metadata searching.

I hope this is useful

Regards,
Katherine Don

Amanda Foster wrote:
> Hi there
>
> I work in an electronic publishing environment that produces large
> amount of content (approximately 9 GB per product - total of approx 30
> GB across all products) to customers served via corporate intranet,
> internet and CD.
>
> I am currently exploring different ways of searching and indexing the
> content more efficiently to meet our client's needs.
>
> I found your site on the web and was very interested in what you were
> offering ...
>
> Our content is currently stored in xml and served out in html.
> For the products to work effectively for our clients, the searching has
> to be like grease lightning and down to the most granular level (for
> example, lawyers want to enter keywords and find where they appear
> across one or more products).
> I would imagine that indexing to paragraph level and then individual
> word or character level would be the answer, but how to make this quick
> across huge volumes of content ?
>
> Also, if a user entered a string like "what do I need to know to build
> a house" ... how would the search return listings of all the relevant
> legal requirements for building a house ? Would metadata be the way to
> solve this or is there something more advanced that can be used ?
>
> Has your product answered any of these questions ?
>
> Look forward to your response.
>
> Many thanks
> Amanda Foster
>
>
>
>
> NB: This email and its contents are subject to our email legal notice
> which can be viewed at: www.juta.co.za/disclaimer.htm
>
> Should you be unable to access the link provided, please contact our
> offices for a copy of the legal notice at: +27 21 763 3500
>
> ***********************************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the system manager.
>
> This footnote also confirms that this email message has been swept by
> MIMEsweeper for the presence of computer viruses.
> ***********************************************************************************
>
>
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>