Re: [greenstone-devel] Ignoring "The" and "A" in titles

From John R. McPherson
DateThu, 16 Dec 2004 12:24:15 +1300
Subject Re: [greenstone-devel] Ignoring "The" and "A" in titles
In-Reply-To (41C0B2A2-2080506-cs-waikato-ac-nz)
On Thu, Dec 16, 2004 at 10:54:42AM +1300, Katherine Don wrote:

> Hi Doug
>
> You can use the -removeprefix option: the value is a regular expression
> that will remove matching symbols at the start of the sort metadata.
>
> Remember that you can do classinfo.pl <classifiername> to see all the
> options available.


> Doug Carter wrote:
> >Hi all,
> >
> >Is there a way to sort documents by title, but ignoring common starting
> >words like "The" or "A".
> >
> >For example, the document titled "The December Field Report" would be
> >sorted in with documents that start with "D" and not "T".
> >
> >TIA,


The classifiers should already do this, if they automatically detect
that the language is English. They call the "format_string_english"
and "format_string_name_english" (if the metadata is "Creator"),
from the perllib/sorttools.pm package.

format_string_english() removes leading "the/a/an".

Maybe the language is being guessed incorrectly?

John