Re: [greenstone-users] Avoiding "the" "a" skipping in A-Z title list classifier

From John R. McPherson
DateFri, 18 Aug 2006 13:28:47 +1200
Subject Re: [greenstone-users] Avoiding "the" "a" skipping in A-Z title list classifier
In-Reply-To (44E43EDD-40004-gmx-net)
On Thu, Aug 17, 2006 at 12:03:09PM +0200, jens wille wrote:
> hi ruben!
>
> ruben pandolfi [17.08.2006 11:56]:
> > can you tell me where to look for this feature in the code and
> > remove it?
> it's in perllib/sorttools.pm, line 77:
>
> $$stringref =~ s/^s*(the|a|an)b//;
>
> comment this line out and your articles will be kept.

Also, it should only delete them if the language for the document is
set to "English". You can specify a default language for all your input
documents for a plugin, otherwise greenstone tries to guess the language.
If it is remove a/an/the for non-english documents, then it's a bug in
greenstone.

John McPherson