Hi Gordon, these are the change we made to AZCompactList.pm to get it to behave the way we wanted. It is two things; a little one to specify how leaves were sorted and a bigger one to change how the branches were sorted: We removed the conditional that calls the <evil>&sorttools::format_string_name_english ($formatted_metavalue);</evil> line (it fails on corporate authors and my surname. All our metadata is Lastname,Firstname so we are better off without this. The second one is to pass a sort argument but I forget what it passes it too. I also have a MultiAZCompact List sent to me by Michael Dewsnip, which you may find helpful. ( I have added the text of the classifier to the end of this message - but beware weird formatting happens - we are using this but changed the hlist to a vlist) Sorry I can't tell you much more, I have been using the db2txt utility and the '-mode infodb' flag for building to speed up my testing and quickly get a look at the output from my classifiers lines. I am thinking of the a 'new books' facility for our collection, so I am keen to have a look at anything you come up with. Let us know how you go, Stephen ________________________________________________ Stephen De Gabrielle Digitisation Officer AraDA Project
Northern Territory University Library http://www.ntu.edu.au/library Tel: (08) 8946 7009 from overseas: 61 8 8946 7009 Postal address: P.O.Box 41246, Casuarina, NT, 0811, Australia CRICOS Provider No: 00300K >Hi all, > >Does anyone know exactly how AZCompactList classifiers sort the documents >inside each category? The global sortmeta in import doesn't work, and >there's no classifier-specific option. The code is not easy to read (gsdl >2.39). > >In fact, what I really want to do is a reverse-sort-by-date. Can anyone >suggest a way to do this (given I have a Date metadata field)? > >Gordon > >_______________________________________________ >greenstone-devel mailing list >greenstone-devel@list.scms.waikato.ac.nz >https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-devel ------------------------ sub classify { my $self = shift (@_); my ($doc_obj) = @_; my $doc_OID = $doc_obj->get_OID(); my @sectionlist = (); my $topsection = $doc_obj->get_top_section(); my $metaname = $self->{'metaname'}; my $outhandle = $self->{'outhandle'}; $metaname =~ s/(/.*)//; # grab first name in n1/n2/n3 list if ($self->{'doclevel'} =~ /^top(level)?/i) { push(@sectionlist,$topsection); } else { my $thissection = $doc_obj->get_next_section($topsection); while (defined $thissection) { push(@sectionlist,$thissection); $thissection = $doc_obj->get_next_section ($thissection); } } my $thissection; foreach $thissection (@sectionlist) { my $full_doc_OID = ($thissection ne "") ? "$doc_OID.$thissection" : $doc_OID; if (defined $self->{'list'}->{$full_doc_OID}) { print $outhandle "WARNING: NTUAZCompactList::classify called multiple times for $full_doc_OID
" } $self->{'list'}->{$full_doc_OID} = []; $self->{'listmetavalue'}->{$full_doc_OID} = []; my $metavalues = $doc_obj->get_metadata($thissection,$metaname); my $metavalue; foreach $metavalue (@$metavalues) { # if this document doesn't contain the metadata element we're # sorting by we won't include it in this classification if (defined $metavalue && $metavalue =~ /w/) { if ($self->{'removeprefix'}) { $metavalue =~ s/^$self->{'removeprefix'}//; } my $formatted_metavalue = $metavalue; ############# THIS ####################################################### &sorttools::format_string_english ($formatted_metavalue); ############## REPLACED THIS ############################################### # if ($self->{'metaname'} =~ m/^Creator(:.*)?$/) # { # &sorttools::format_string_name_english ($formatted_metavalue); # } # else # { # &sorttools::format_string_english ($formatted_metavalue); # } ###### SD IR 2003 ########################################################### #### prefix-str if (! defined($formatted_metavalue)) { print $outhandle "Warning: NTUAZCompactList: metavalue is " print $outhandle "empty
" $formatted_metavalue="" } push(@{$self->{'list'}->{$full_doc_OID}},$formatted_metavalue); push(@{$self->{'listmetavalue'}->{$full_doc_OID}} ,$metavalue); last if ($self->{'onlyfirst'}); } } my $date = $doc_obj->get_metadata_element($thissection,"Date"); $self->{'reclassify'}->{$full_doc_OID} = [$doc_obj,$date]; } } sub reinit { my ($self,$classlist_ref) = @_; my $outhandle = $self->{'outhandle'}; my %mtfreq = (); my @single_classlist = (); my @multiple_classlist = (); # find out how often each metavalue occurs map { my $mv; foreach $mv (@{$self->{'listmetavalue'}->{$_}} ) { $mtfreq{$mv}++; } } @$classlist_ref; # use this information to split the list: single metavalue/repeated value map { my $i = 1; my $metavalue; foreach $metavalue (@{$self->{'listmetavalue'}->{$_}}) { if ($mtfreq{$metavalue} >= $self->{'mingroup'}) { push(@multiple_classlist,[$_,$i,$metavalue]); } else { push(@single_classlist,[$_,$metavalue]); $metavalue =~ tr/[A-Z]/[a-z]/; $self->{'reclassifylist'}->{"Metavalue_$i.$_"} = $metavalue; } $i++; } } @$classlist_ref; # Setup sub-classifiers for multiple list $self->{'classifiers'} = {}; my $pm; foreach $pm ("List", "SectionList") { my $listname = &util::filename_cat($ENV{'GSDLHOME'},"perllib/classify/$pm.pm"); if (-e $listname) { require $listname; } else { print $outhandle "NTUAZCompactList ERROR - couldn't find classifier "$listname"
" die "
" } } # Create classifiers objects for each entry >= mingroup my $metavalue; foreach $metavalue (keys %mtfreq) { if ($mtfreq{$metavalue} >= $self->{'mingroup'}) { my $listclassobj; my $doclevel = $self->{'doclevel'}; my $metaname = $self->{'metaname'}; my @metaname_list = split('/',$metaname); $metaname = shift(@metaname_list); if (@metaname_list==0) { my @args; push @args, ("-metadata", "$metaname"); # buttonname is also used for the node's title push @args, ("-buttonname", "$metavalue"); ################################################### # push @args, ("-sort", "Date"); ################################################### ## SORT LEAVES (s.degabrielle/I.Rohoza 2003) push @args, ("-sort", "Title"); ################################################### if ($doclevel =~ m/^top(level)?/i) { eval ("$listclassobj = new List(@args)"); warn $@ if $@; } else { eval ("$listclassobj = new SectionList(@args)"); } } else { $metaname = join('/',@metaname_list); my @args; push @args, ("-metadata", "$metaname"); # buttonname is also used for the node's title push @args, ("-buttonname", "$metavalue"); push @args, ("-doclevel", "$doclevel"); push @args, "-recopt" eval ("$listclassobj = new NTUAZCompactList(@args)"); } if ($@) { print $outhandle "$@" die "
" } $listclassobj->init(); if (defined $metavalue && $metavalue =~ /w/) { my $formatted_node = $metavalue; if ($self->{'removeprefix'}) { $formatted_node =~ s/^$self->{'removeprefix'}//; } ############# THIS ####################################################### &sorttools::format_string_english($formatted_node); ############## REPLACED THIS ############################################### # if ($self->{'metaname'} =~ m/^Creator(:.*)?$/) # { # &sorttools::format_string_name_english($formatted_node); # } # else # { # &sorttools::format_string_english($formatted_node); # } ######## SD/IR 2003 ################################################# # In case our formatted string is empty... if (! defined($formatted_node)) { print $outhandle "Warning: NTUAZCompactList: metavalue is " print $outhandle "empty
" $formatted_node="" } $self->{'classifiers'}->{$metavalue} = { 'classifyobj' => $listclassobj, 'formattednode' => $formatted_node }; } } } return (@single_classlist,@multiple_classlist); } ############# ########################################################################### # # MultiAZCompactList.pm -- # A component of the Greenstone digital library software # from the New Zealand Digital Library Project at the # University of Waikato, New Zealand. # # Copyright (C) 1999 New Zealand Digital Library Project # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. # ########################################################################### #Unfortunately, the classifier starts to look pretty ugly when # browsing past the first metadata level. This can only be fixed # by editing the C++ receptionist code, which I might do when # time allows. # #Anyway, have a play if you want (just remember it's only beta). # I've tentatively called it the MultiAZCompactList, so you # would add something like: # #classify MultiAZCompactList -metadata Subject,Title # -groupsize 20,20 # #to your collection configuration file. This would create a # classifier which first classifies on Subject, then on Title. # The groupsize specifies the number of child items allowed # (for the corresponding metadata element) before an hlist # partition is added. If you try it on the demo collection, # use something small like -groupsize 2,2 to see the effect. # # package MultiAZCompactList; use BasClas; sub BEGIN { @ISA = ('BasClas'); } my $arguments = [ { 'name' => "metadata", 'desc' => "Metadata fields used for classification, comma separated.", 'type' => "metalist", 'reqd' => "yes" } , { 'name' => "buttonname", 'desc' => "Button name for this classifier.", 'type' => "string", 'deft' => "First metadata field specified with -metadata", 'reqd' => "no" }, { 'name' => "alwaysgroup", 'desc' => "Create a bookshelf icon even if there is only one item in the group.", 'type' => "string", 'deft' => "True for all metadata fields except the last", 'reqd' => "no" }, { 'name' => "groupsize", 'desc' => "The number of items in each hlist group.", 'type' => "string"} ]; my $options = { 'name' => "MultiAZCompactList", 'desc' => "", 'inherits' => "Yes", 'args' => $arguments }; sub new { my $class = shift(@_); my $self = new BasClas($class, @_); # To allow for proper inheritance of arguments local $option_list = $self->{'option_list'}; push(@{$option_list}, $options); local $metadata; local $buttonname; local $alwaysgroup; local $groupsize; if (!parsargv::parse(@_, q^metadata/.*/^, $metadata, q^buttonname/.*/^, $buttonname, q^alwaysgroup/.*/^, $alwaysgroup, q^groupsize/.*/^, $groupsize, "allow_extra_options")) { print STDERR "
Incorrect options passed to $class, check your collect.cfg file
" $self->print_txt_usage(); die "
" } # The metadata elements to use (required) if (!$metadata) { die "Error: No metadata fields specified for MultiAZCompactList.
" } local @metalist = split(/,/, $metadata); $self->{'metalist'} = @metalist; # Create an empty list for the OID values $self->{'OIDlist'} = []; # Create an empty hash for the metadata values of each metadata element foreach $metaelem (@metalist) { $self->{$metaelem . ".list"} = {}; } # The classifier button name if (!$buttonname) { # Default: the first metadata field specified $buttonname = $metalist[0]; } $self->{'title'} = $buttonname; # Whether to group single items into a bookshelf if (!$alwaysgroup) { # Default: true for all metadata fields except the last foreach $metaelem (@metalist) { $self->{$metaelem . ".alwaysgroup"} = "t" } local $lastelem = $metalist[$#metalist]; $self->{$lastelem . ".alwaysgroup"} = "f" } else { local @alwaysgrouplist = split(/,/, $alwaysgroup); # Assign values based on the always group parameter foreach $metaelem (@metalist) { local $alwaysgroupelem = shift(@alwaysgrouplist); if (defined($alwaysgroupelem)) { $self->{$metaelem . ".alwaysgroup"} = $alwaysgroupelem; } else { if ($metaelem ne $metalist[$#metalist]) { $self->{$metaelem . ".alwaysgroup"} = "t" } else { $self->{$metaelem . ".alwaysgroup"} = "f" } } } } # The number of items in each group if (!$groupsize) { # Default: 20 in first level, 19 in second level, ... etc. local $thisgroupsize = 20; foreach $metaelem (@metalist) { $self->{$metaelem . ".groupsize"} = $thisgroupsize; $thisgroupsize--; } } else { local @groupsizelist = split(/,/, $groupsize); # Assign values based on the groupsize parameter foreach $metaelem (@metalist) { local $groupsizeelem = shift(@groupsizelist); if (defined($groupsizeelem)) { $self->{$metaelem . ".groupsize"} = $groupsizeelem; } else { $self->{$metaelem . ".groupsize"} = $self->{$metalist[0] . ".groupsize"}; } } } return bless $self, $class; } sub init { # Nothing to do... local $self = shift(@_); } sub classify { local $self = shift(@_); local $doc_obj = shift(@_); local $doc_OID = $doc_obj->get_OID(); local $doc_top = $doc_obj->get_top_section(); local @metalist = @{$self->{'metalist'}}; # Only classify the document if it has a value for the first metadata element local $firstelem = $metalist[0]; if (defined($doc_obj->get_metadata_element($doc_top, $firstelem))) { push(@{$self->{'OIDlist'}}, $doc_OID); # Get the value of each metadata element for this document foreach $metaelem (@metalist) { local $metavalue = $doc_obj->get_metadata_element($doc_top, $metaelem); # If there is no value for this metadata element, use "Unknown" if (!defined($metavalue)) { $metavalue = "Unknown" } # Make the value title case substr($metavalue, 0, 1) =~ tr/a-z/A-Z/; # print "Metaelem: $metaelem, Value: $metavalue
" $self->{$metaelem . ".list"}->{$doc_OID} = $metavalue; } } } sub get_classify_info { local $self = shift(@_); # The metadata elements to classify by local @metalist = @{$self->{'metalist'}}; # The OID values of the documents to include in the classification local @OIDlist = @{$self->{'OIDlist'}}; # print "Number of OIDs to include in classification: " . @OIDlist . "
" # The root node of the classification hierarchy local %classifyinfo = ( 'thistype' => "Invisible", 'Title' => $self->{'title'}, 'contains' => [] ); # Recursively create the classification hierarchy, one level for each metadata element &add_az_list($self, @metalist, @OIDlist, %classifyinfo); return %classifyinfo; } sub add_az_list { local $self = shift(@_); local @metalist = @{shift(@_)}; local @OIDlist = @{shift(@_)}; local $classifyinfo = shift(@_); # print "
Adding AZ list for " . $classifyinfo->{'Title'} . "
" local $metaelem = $metalist[0]; # print "Processing metadata element: " . $metaelem . "
" # print "Number of OID values: " . @OIDlist . "
" local %OIDtometavaluehash = %{$self->{$metaelem . ".list"}}; # Create a mapping from metadata value to OID local %metavaluetoOIDhash = (); foreach $OID (@OIDlist) { local $metavalue = $OIDtometavaluehash{$OID}; push(@{$metavaluetoOIDhash{$metavalue}}, $OID); } # print "Number of distinct values: " . scalar(keys %metavaluetoOIDhash) . "
" # Partition the values (if necessary) local $groupsize = $self->{$metaelem . ".groupsize"}; if (scalar(keys %metavaluetoOIDhash) > $groupsize) { local @sortedmetavalues = sort(keys %metavaluetoOIDhash); local $itemsdone = 0; local %metavaluetoOIDsubhash = (); local $lastpartitionend = "" local $partitionstart; foreach $metavalue (@sortedmetavalues) { # print "Metavalue: $metavalue
" $metavaluetoOIDsubhash{$metavalue} = $metavaluetoOIDhash{$metavalue}; $itemsdone++; local $itemsinpartition = scalar(keys %metavaluetoOIDsubhash); # Is this the start of a new partition? if ($itemsinpartition == 1) { $partitionstart = &generate_partition_start($metavalue, $lastpartitionend); } # Is this the end of the partition? if ($itemsinpartition == $groupsize || $itemsdone == @sortedmetavalues) { local $partitionend = &generate_partition_end($metavalue, $partitionstart); local $partitionname = $partitionstart; if ($partitionend ne $partitionstart) { $partitionname = $partitionname . "-" . $partitionend; } # print "Partition: $partitionname
" &add_hlist_partition($self, @metalist, $classifyinfo, $partitionname, %metavaluetoOIDsubhash); %metavaluetoOIDsubhash = (); $lastpartitionend = $partitionend; } } # The partitions are stored in an HList $classifyinfo->{'childtype'} = "HList" } # Otherwise just add all the values to a VList else { &add_vlist($self, @metalist, $classifyinfo, %metavaluetoOIDhash); $classifyinfo->{'childtype'} = "VList" } } sub generate_partition_start { local $metavalue = shift(@_); local $lastpartitionend = shift(@_); local $partitionstart = substr($metavalue, 0, 1); if ($partitionstart le $lastpartitionend) { $partitionstart = substr($metavalue, 0, 2); # Give up after three characters if ($partitionstart le $lastpartitionend) { $partitionstart = substr($metavalue, 0, 3); } } return $partitionstart; } sub generate_partition_end { local $metavalue = shift(@_); local $partitionstart = shift(@_); local $partitionend = substr($metavalue, 0, length($partitionstart)); if ($partitionend gt $partitionstart) { $partitionend = substr($metavalue, 0, 1); if ($partitionend le $partitionstart) { $partitionend = substr($metavalue, 0, 2); # Give up after three characters if ($partitionend le $partitionstart) { $partitionend = substr($metavalue, 0, 3); } } } return $partitionend; } sub add_hlist_partition { local $self = shift(@_); local @metalist = @{shift(@_)}; local $classifyinfo = shift(@_); local $partitionname = shift(@_); local $metavaluetoOIDhash = shift(@_); # Create an hlist partition local %subclassifyinfo = ( 'Title' => $partitionname, 'childtype' => "VList", 'contains' => [] ); # Add the children to the hlist partition &add_vlist($self, @metalist, %subclassifyinfo, %metavaluetoOIDsubhash); push(@{$classifyinfo->{'contains'}}, %subclassifyinfo); } sub add_vlist { local $self = shift(@_); local @metalist = @{shift(@_)}; local $classifyinfo = shift(@_); local $metavaluetoOIDhash = shift(@_); local $metaelem = shift(@metalist); # Create an entry in the vlist for each value foreach $metavalue (sort(keys %{$metavaluetoOIDhash})) { local @OIDlist = @{$metavaluetoOIDhash->{$metavalue}}; # If there is only one item and 'alwaysgroup' is false, add the item to the list if (@OIDlist == 1 && $self->{$metaelem . ".alwaysgroup"} eq "f") { push(@{$classifyinfo->{'contains'}}, { 'OID' => $OIDlist[0] }); } # Otherwise create a sublist (bookshelf) for the metadata value else { local %subclassifyinfo = ( 'Title' => $metavalue, 'childtype' => "VList", 'contains' => [] ); # If there are metadata elements remaining, recursively apply the process if (@metalist > 0) { &add_az_list($self, @metalist, @OIDlist, %subclassifyinfo); } # Otherwise just add the documents as children of this list else { foreach $OID (@OIDlist) { push(@{$subclassifyinfo{'contains'}}, { 'OID' => $OID }); } } # Add the sublist to the list push(@{$classifyinfo->{'contains'}}, %subclassifyinfo); } } } 1; ########### >_________________________________________________ >Stephen De Gabrielle >Digitisation Officer >AraDA Project > >Northern Territory University Library >http://www.ntu.edu.au/library >Tel: (08) 8946 7009 from overseas: 61 8 8946 7009 >Postal address: P.O.Box 41246, Casuarina, NT, 0811, Australia >CRICOS Provider No: 00300K > >"Gordon Paynter" <gordon.paynter@ucr.edu> >Sent by: greenstone-devel-bounces@list.scms.waikato.ac.nz >07/09/2003 05:06 AM MST >Please respond to gordon.paynter > > To: greenstone-devel@list.scms.waikato.ac.nz > cc: > bcc: > Subject: [greenstone-devel] AZCompactList sorting > > > >Hi all, > >Does anyone know exactly how AZCompactList classifiers sort the documents >inside each category? The global sortmeta in import doesn't work, and >there's no classifier-specific option. The code is not easy to read (gsdl >2.39). > >In fact, what I really want to do is a reverse-sort-by-date. Can anyone >suggest a way to do this (given I have a Date metadata field)? > >Gordon > >_______________________________________________ >greenstone-devel mailing list >greenstone-devel@list.scms.waikato.ac.nz >https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-devel |