[greenstone-users] Sorting Russian PDF in Classifiers

From Jonathan Tremblay
DateSun, 11 Jun 2006 22:45:35 -0400
Subject [greenstone-users] Sorting Russian PDF in Classifiers



My project contains English, Spanish, French and Russian documents (all in PDF).


Since the beginning, sorting has been a problem. So I created a metadata field specifically for sorting. At first I used numbers, but since I still got problems, I start using characters in that field (AAAA, AAAB, AAAC, etc.)


It worked perfectly for the search results. But Russian documents are not sorted correctly in the classifiers (AZCompactList and Hierarchy): they always appear before other documents (ex. RAAA, RAAB, AAAA, AAAB, AAAC, etc.)




I got a similar problem with an English PDF which contained no editable text (it contained only images from a scan). As soon as I replace the document with a PDF version containing text, the document got sorted correctly. By the way, all my Russian documents contain editable text.




Jonathan Tremblay