[greenstone-users] Sort order problem in french (accentuated words at the bottom of the lists)

From Argon7 User List
DateTue, 09 Jan 2007 00:37:36 +0100
Subject [greenstone-users] Sort order problem in french (accentuated words at the bottom of the lists)
Hi,

I've a problem with my "Titles A-Z" and "Creators A-Z" lists:
accentuated words are not sorted properly... for example, you can check
this page:
http://cinematheque.cfwb.be/gsdl/cgi-bin/library?e=d-00000-00---0ccfbcata--00-0--0-10-0---0---0prompt-10---4-------0-1l--11-fr-200---40-about---00-0-1-00-11-1-0utfZz-8-00&a=d&cl=CL2.16
as you will see "Pé", "Péché Jean-Jacques", etc... are placed at the
bottom of the list, under "Puttemans"...
I've attached the main and the collection configs.

Thanks for your help!

-- Jo


<<attachment>>
Type: text/plain
Filename: collect.cfg

creatoryves@argon7.be
maintaineryves@argon7.be
publictrue

language_metadatadc.Language

indexesdocument:text document:dc.Title document:dc.Subject,text
defaultindexdocument:text

subcollectionfilm"dc.Source/^Film/i"
subcollectionvideo"dc.Source/^Cassette/i"
subcollectiondisque"dc.Source/^Disque/i"
subcollectionautre"!dc.Source/^(Film|Cassette|Disque)/i"

indexsubcollectionsdisque,video,film,autre disque video film autre
defaultsubcollectiondisque,video,film,autre

pluginGAPlug
pluginTEXTPlug
pluginHTMLPlug -input_encoding utf8 -smart_block -rename_assoc_files -no_metadata
pluginImagePlug
pluginNULPlug
pluginArcPlug
pluginRecPlug -use_metadata_files


classifyAZList -metadata dc.Title
classifyAZCompactList -metadata dc.Creator -allvalues -sort dc.Title
#classifyAZCompactList -metadata dc.Creator
classifyDateList -metadata dc.Date -sort dc.Title -nogroup
classifyHierarchy -hfile themes.txt -metadata dc.Subject -sort dc.Title -allvalues -buttonname Subject


format HList "[link][ex.Title][/link]&nbsp;"
format CL2HList "[link][ex.Title][/link]&nbsp;"

format DocumentButtons ""
#format DocumentButtons "Highlight"

#format CL1VList "<td valign=top>[link][icon][/link]</td>
#<td valign=top>[ex.srclink]{Or}{[ex.thumbicon],[ex.srcicon]}[ex./srclink]</td>
#<td valign=top>[highlight]
#{Or}{[dls.Title],[dc.Title],[ex.Title],Untitled}
#[/highlight]{If}{[dc.Creator],<i>([dc.Creator])</i>}</td>"
#
#format CL4VList "<td valign=top>[link][icon][/link]</td>
#<td valign=top>[ex.srclink]{Or}{[ex.thumbicon],[ex.srcicon]}[ex./srclink]</td>
#<td valign=top>{If}{[numleafdocs],<b>[Title]</b>}<b>[dc.Subject]</b> [dc.Title]</td>"

format CL2VList "<td valign=top>[link][icon][/link]</td> <td valign=top>[ex.srclink]{Or}{[ex.thumbicon],[ex.srcicon]}[ex./srclink]</td> <td valign=top>{If}{[numleafdocs],[link][Title][/link]}[link][dc.Creator][/link]{If}{[numleafdocs], ([numleafdocs] fiches)}{If}{[dc.Title], - [dc.Title]}{If}{[dc.Date],&sbquo; [dc.Date]}</td>"

format CL4VList "<td valign=top>[link][icon][/link]</td> <td valign=top>[ex.srclink]{Or}{[ex.thumbicon],[ex.srcicon]}[ex./srclink]</td> <td valign=top>[link][Title][/link]{If}{[numleafdocs], ([numleafdocs] fiches)}{If}{[dc.Title],[link][dc.Title][/link]}{If}{[dc.Creator],&sbquo; de [dc.Creator]}{If}{[dc.Date],&sbquo; [dc.Date]}</td>"

format VList "<td valign=top>[link][icon][/link]</td><td valign=top>[ex.srclink]{Or}{[ex.thumbicon],[ex.srcicon]}[ex./srclink]</td><td valign=top>[link]{Or}{[dc.Title],[ex.Title],Sans titre}[/link]{If}{[dc.Creator],&sbquo; de [dc.Creator]}{If}{[dc.Date],&sbquo; [dc.Date]}{If}{[dc.Source],&sbquo; [dc.Source]}{If}{[dc.Format], ([dc.Format])}</td>"

format DateList "<td>[link][icon][/link]</td><td>[link]{Or}{[dc.Title],[ex.Title],Sans titre}[/link]{If}{[dc.Creator],&sbquo; de [dc.Creator]}</td><td>[ex.Date]</td>"

format DocumentHeading "&nbsp;"
#format DocumentHeading "{If}{[dc.Creator],[link][dc.Creator][/link]}"
#format DocumentHeading "<H1>[dc.Title]</H1><hr>
#{If}{[dc.Creator], [dc.Creator], Réalisateur inconnu} {If}{[dc.Date], [dc.Date]}"

format DocumentText "[Text]"

collectionmetacollectionname [l=fr] "ccfbcata"
collectionmetacollectionextra [l=fr] "Le catalogue de la Cinémathèque peut dorénavant être consulté sur ce site. Vous trouverez des films culturels et éducatifs de portée générale qui restent d'actualité mais aussi de très nombreux films didaticques ne présentant plus aucune valeur pédagogique actuelle, mais qui n'en conservent pas moins une valeur incomparable pour l'éducation ou la simple évocation d'une époque ou d'un moment de notre histoire."
collectionmeta.document:text [l=fr] "Fiches"
collectionmeta.document:dc.Subject,text [l=fr] "Sujets"
collectionmeta.document:dc.Title [l=fr] "Titres"
collectionmeta.film [l=fr] "Pellicules"
collectionmeta.video [l=fr] "Vidéos"
collectionmeta.disque [l=fr] "CD - DVD"
collectionmeta.autre [l=fr] "Autres"
collectionmeta.disque,video,film,autre [l=fr] "Tous les supports"

collectionmacro Style:cssheader '
<link rel="stylesheet" href="_httpcollection_/images/style.css" type="text/css" media="screen">
'


<<attachment>>
Type: text/plain
Filename: main.cfg

# This file must be utf-8 encoded
#
# This is the main configuration file for configuring
# your Greenstone receptionist (the bit responsible for the way
# things are displayed) and contains information common
# to the interface of all collections served by the site.

# Email address of the webmaster of this Greenstone installation
# If maintainer is set to "NULL" EmailEvents and EmailUserEvents
# will be disabled.
maintainer NULL

# Outgoing (SMTP) mail server for this Greenstone installation.
# This will default to mail.maintainer-domain if it's not set
# (i.e. if maintainer is greenstone@cs.waikato.ac.nz then MailServer
# will default to mail.cs.waikato.ac.nz). If MailServer doesn't
# resolve to a valid SMTP server then the EmailEvents and
# EmailUserEvents options (see below) won't be functional. Likewise,
# turning off EmailEvents and EmailUserEvents will remove any
# reliance on MailServer.
MailServer NULL

# Set status to "enabled" if you want the Maintenance and
# Administration facility to be available.
status enabled

# Set collector to "disabled" if you don't want the "collector"
# end-user collection building facility to be available.
collector disabled

# Set depositor to "disabled" if you don't want the "depositor"
# (aka institutional repository) facility to be available.
depositor disabled

# Set gliapplet to "disabled" if you don't want the remote users
# to be able to build collections on your server through an applet
# version of GLI
gliapplet disabled

# Set logcgiargs to true to keep a log of usage information in
# $GSDLHOME/etc/usage.txt.
logcgiargs false

# Set usecookies to true to use cookies to identify users (cookie
# information will be written to the usage log if logcgiargs is
# true).
usecookies false

# LogDateFormat sets the format that timestamps will be stored in the usage
# log (i.e. if logcgiargs is enabled). It takes the following values:
# LocalTime: (the default) The local time and date in the form
# "Thu Dec 07 23:47:00 NZDT 2000".
# UTCTime: Coordinated universal time (GMT) in the same format as LocalTime.
# Absolute: Integer value representing the number of seconds since
# 00:00:00 1/1/1970 GMT
LogDateFormat LocalTime

# Log any events that Greenstone deems important in
# $GSDLHOME/etc/events.txt.
# The only events that are currently implemented come from the
# collector (e.g. someone just built/deleted the following collection)
# LogEvents may take values of:
# AllEvents: All important events
# CollectorEvents: Just those events originating from the collector
# (e.g. someone just built a collection)
# disabled: Don't log events
LogEvents disabled

# Email the maintainer whenever any event occurs. EmailEvents
# takes the same values as LogEvents.
# Note that perl must be installed for EmailEvents or
# EmailUserEvents to work.
EmailEvents disabled

# In some cases it may be appropriate to email the user about a
# certain event (e.g. notification from the collector that a collection
# was built successfully)
EmailUserEvents false


# The list of display macro files used by this receptionist
macrofiles ccfbcata.dm tip.dm style.dm base.dm query.dm help.dm pref.dm about.dm \
document.dm browse.dm status.dm authen.dm users.dm html.dm \
extlink.dm gsdl.dm extra.dm home.dm collect.dm deposit.dm docs.dm \
bsummary.dm gti.dm gli.dm nav_css.dm languages.dm \
french.dm french2.dm english.dm english2.dm spanish.dm \
spanish2.dm russian.dm russian2.dm \
hebrew.dm czech.dm czech2.dm galician.dm galician2.dm \
indo.dm indo2.dm japanese.dm japanese2.dm thai.dm thai2.dm \
kazakh.dm kazakh2.dm port-br.dm port-pt.dm \
chinese.dm german.dm maori.dm arabic.dm arabic2.dm dutch.dm \
italian.dm italian2.dm turkish.dm turkish2.dm \
ukrainian.dm croatian.dm hindi.dm kannada.dm finnish.dm \
greek.dm armenian.dm armenian2.dm farsi.dm serbian.dm georgian.dm \
georgian2.dm catalan.dm catalan2.dm latvian.dm latvian2.dm \
vietnamese.dm vietnamese2.dm chinese-trad.dm chinese-trad2.dm \
mongolian.dm mongolian2.dm kirghiz.dm bengali.dm polish.dm gaelic.dm \
slovak.dm urdu.dm urdu2.dm marathi.dm

# Define the interface languages and encodings supported by this receptionist

# An "Encoding" line defines an encoding to be used by the receptionist.
# Uncomment "Encoding" lines to include an encoding on your "preferences" page.
# Encoding line options are:
# shortname -- The standard charset label for the given encoding. The
# shortname option is mandatory.
# longname -- The display name of the given encoding. If longname isn't set
# it will default to using shortname instead.
# map -- The name of the map file (i.e. the .ump file) for use when
# converting between unicode and the given encoding. The map
# option is mandatory for all encoding lines except the
# special case for utf8.
# multibyte -- This optional argument should be set for all encodings that use
# multibyte characters.

# The utf8 encoding is handled internally and doesn't require a map file.
# As a rule the utf8 encoding should always be enabled, especially if you
# have collections of documents that may not all be in the same
# language/encoding.
Encoding shortname=utf-8 "longname=Unicode (UTF-8)"

# This is very experimental, and you almost certainly don't need it
#Encoding shortname=utf-16be "longname=Unicode (UTF-16BE)"

# The ISO-8859 series
Encoding shortname=iso-8859-1 "longname=Western (ISO-8859-1)" map=8859_1.ump
#Encoding shortname=iso-8859-2 "longname=Central European (ISO-8859-2)" map=8859_2.ump
#Encoding shortname=iso-8859-3 "longname=Latin 3 (ISO-8859-3)" map=8859_3.ump
#Encoding shortname=iso-8859-4 "longname=Latin 4 (ISO-8859-4)" map=8859_4.ump
#Encoding shortname=iso-8859-5 "longname=Cyrillic (ISO-8859-5)" map=8859_5.ump
#Encoding shortname=iso-8859-6 "longname=Arabic (ISO-8859-6)" map=8859_6.ump
#Encoding shortname=iso-8859-7 "longname=Greek (ISO-8859-7)" map=8859_7.ump
#Encoding shortname=iso-8859-8 "longname=Hebrew (ISO-8859-8)" map=8859_8.ump
#Encoding shortname=iso-8859-9 "longname=Turkish (ISO-8859-9)" map=8859_9.ump
#Encoding shortname=iso-8859-15 "longname=Western (ISO-8859-15)" map=8859_15.ump

# Windows codepages
Encoding shortname=windows-1250 "longname=Central European (Windows-1250)" map=win1250.ump
Encoding shortname=windows-1251 "longname=Cyrillic (Windows-1251)" map=win1251.ump
#Encoding shortname=windows-1252 "longname=Western (Windows-1252)" map=win1252.ump
Encoding shortname=windows-1253 "longname=Greek (Windows-1253)" map=win1253.ump
Encoding shortname=windows-1254 "longname=Turkish (Windows-1254)" map=win1254.ump
Encoding shortname=windows-1255 "longname=Hebrew (Windows-1255)" map=win1255.ump
Encoding shortname=windows-1256 "longname=Arabic (Windows-1256)" map=win1256.ump
#Encoding shortname=windows-1257 "longname=Baltic (Windows-1257)" map=win1257.ump
#Encoding shortname=windows-1258 "longname=Vietnamese (Windows-1258)" map=win1258.ump
#Encoding shortname=windows-874 "longname=Thai (Windows-874)" map=win874.ump
#Encoding shortname=cp866 "longname=Cyrillic (DOS)" map=dos866.ump
#Encoding shortname=cp850 "longname=Latin-1 (DOS)" map=dos850.ump
#Encoding shortname=cp852 "longname=Central European (DOS)" map=dos852.ump

# KOI8 Cyrillic encodings
#Encoding shortname=koi8-r "longname=Cyrillic (KOI8-R)" map=koi8_r.ump
#Encoding shortname=koi8-u "longname=Cyrillic (KOI8-U)" map=koi8_u.ump

# CJK encodings (note that Shift-JIS Japanese isn't currently supported)
Encoding shortname=gbk "longname=汉语 (Chinese Simplified GBK)" map=gbk.ump multibyte
Encoding shortname=big5 "longname=漢語 (Chinese Traditional Big5)" map=big5.ump multibyte
Encoding shortname=euc-jp "longname=Japanese (EUC)" map=euc_jp.ump multibyte
Encoding shortname=euc-kr "longname=Korean (UHC)" map=uhc.ump multibyte


# A "Language" line defines an interface language to be used by the
# interface. Note that it is possible to display only a subset of the
# specified languages on the preferences page for a given collection by
# using the "PreferenceLanguages" format option in your collect.cfg
# configuration file.
# Arguments are:
# shortname -- ISO 639 two letter language symbol. The shortname
# argument is mandatory.
# longname -- The display name for the given language. If longname
# isn't set it will default to using shortname instead.
# default_encoding -- The encoding to use by default when using the given
# interface language. This should be set to the
# "shortname" of a valid "Encoding" line
Language shortname=ar longname=Arabic default_encoding=windows-1256
Language shortname=bn "longname=বাংলা (Bengali)" default_encoding=utf-8
Language shortname=ca "longname=Català (Catalan)" default_encoding=utf-8
Language shortname=cs "longname=ÄŒesky (Czech)" default_encoding=utf-8
Language shortname=de "longname=Deutsch (German)" default_encoding=utf-8
Language shortname=el "longname=Ελληνικά (Greek)" default_encoding=windows-1253
Language shortname=en longname=English default_encoding=utf-8
Language shortname=es "longname=Español (Spanish)" \
default_encoding=utf-8
Language shortname=fa longname=Farsi default_encoding=utf-8
Language shortname=fi longname=Finnish default_encoding=utf-8
Language shortname=fr "longname=Français (French)" \
default_encoding=utf-8
Language shortname=gd "longname=Gaelic (Scottish)" default_encoding=utf-8
Language shortname=gl longname=Galician default_encoding=utf-8
Language shortname=he longname=Hebrew default_encoding=windows-1255
Language shortname=hi longname=Hindi default_encoding=utf-8
Language shortname=hr longname=Croatian default_encoding=windows-1250
Language shortname=hy longname=Armenian default_encoding=utf-8
Language shortname=id "longname=Bahasa Indonesia (Indonesian)" default_encoding=utf-8
Language shortname=it longname=Italiano default_encoding=utf-8
Language shortname=ja "longname=日本語 (Japanese)" default_encoding=utf-8
Language shortname=ka longname=Georgian default_encoding=utf-8
Language shortname=kk "longname=Қазақ (Kazakh)" default_encoding=utf-8
Language shortname=kn longname=Kannada default_encoding=utf-8
Language shortname=ky "longname=Кыргызча (Kirghiz)" default_encoding=utf-8
Language shortname=lv longname=Latvian default_encoding=utf-8
Language shortname=mi "longname=MÄ□ori" default_encoding=utf-8
Language shortname=mn "longname=Монгол (Mongolian)" default_encoding=utf-8
Language shortname=nl "longname=Nederlands (Dutch)" default_encoding=utf-8
Language shortname=pl "longname=polski (Polish)" default_encoding=utf-8
Language shortname=pt-br "longname=português-BR (Brasil)" \
default_encoding=utf-8
Language shortname=pt-pt "longname=português-PT (Portugal)" \
default_encoding=utf-8
Language shortname=ru "longname=руÑ□Ñ□кий (Russian)" default_encoding=windows-1251
Language shortname=sk "longname=SlovenÄ□ina (Slovak)" default_encoding=utf-8
Language shortname=sr longname=Serbian default_encoding=utf-8
Language shortname=th longname=Thai default_encoding=utf-8
Language shortname=tr longname=Turkish default_encoding=windows-1254
Language shortname=uk longname=Ukrainian default_encoding=utf-8
Language shortname=vi "longname=Tiếng Việt (Vietnamese)" default_encoding=utf-8
Language shortname=zh "longname=简体中文 (Simplified Chinese)" default_encoding=gbk
Language shortname=zh-tr "longname=ç¹□體中文 (Traditional Chinese)" default_encoding=big5


# Define any additional page parameters to be used by the above macro files
# (the current default page parameters are c (collection) and l (language)

# Define v (version -- text or graphic) page parameter and give it a default
# value of 0 (0 = text version off)
pageparam v 0

# Set the precedence given to the page parameters. This effects which macro
# will be selected for display when there are multiple versions of the same
# macro with different page parameters.
# e.g. Given a macroprecedence of "c,v,l" and the following macro definitions:
# _content_ []
# _content_ [l=en]
# _content_ [c=demo]
# _content_ [v=1]
# _content_ [l=fr,v=1,c=hdl]
# If the corresponding cgi arguments were set to l=en&v=1&c=hdl then the
# _content_[v=1] macro would be selected for display. It would be selected
# ahead of the _content_[l=en] macro because "v" has a higher precedence
# than "l". The _content_[l=fr,v=1,c=hdl] macro would not be selected
# because one of the page parameters is completely wrong ("l").
macroprecedence c,v,l


# Define any additional cgi arguments. Most cgi arguments are built into
# Greenstone but it's possible to define them here (or set defaults for
# existing built-in cgi arguments).

# define the "v" cgi argument (to correspond to the "v" page parameter defined
# above).
cgiarg shortname=v longname=version multiplechar=false argdefault=0 \
defaultstatus=weak savedarginfo=must

# set a default value for the built-in "a" cgi argument
cgiarg shortname=a argdefault=p

# set a default value for the built-in "p" cgi argument
cgiarg shortname=p argdefault=home

# set the default encoding to utf-8
cgiarg shortname=w argdefault=utf-8

cgiarg shortname=l argdefault=fr
cgiarg shortname=m argdefault=200
cgiarg shortname=o argdefault=40