|Date||Fri, 17 Mar 2006 02:17:40 +1300|
|Subject||Re: [greenstone-users] encoding again|
Richard's suggestion is maybe not as mad as it sounds. We've had occasional problems with the XML Parser module in perl, where older versions work ok but newer ones mess up the encoding (by appearing to encode to UTF-8 twice, turning two-byte characters into four-byte characters and so on). If the encoding of your archive files is correct, but the text coming out of those archives at build time is messed up, then I'd suspect the XML Parser. If that's the case it shouldn't make any difference if you build with mgpp or mg, you'll have the same problem.
I'd suggest rebuilding your collection with mg, just to see if it works. If it does then it's a problem with mgpp, which seems unlikely, but is possible. If it doesn't then it's likely to be perl, or more correctly perl's XML Parser module.
jens wille wrote:
hi richard! Richard Managh [16.03.2006 03:44]:I'm not aware of any way of directly looking at what's in the mg(pp) indexes.too bad :-(o Perhaps it's a problem matching your inputted text with that in the index when you submit search queries. In all cases when you test searching are your input characters the same encoding as what greenstone expects? (the "w" argument)no, that's not the problem: first, the "w" argument is correct, second, what i was trying to do now is to replace all umlauts, so there are no special characters in the input.o Some versions of Perl sometimes get confused and double encode UTF-8 when the xml parser parses your archives directory during the build phase. If you are running perl 5.8, try 5.6.*lol* sorry, but that's a bit odd a suggestion, isn't it ;-) rather i'd like to learn where this happens and how i can avoid it. (btw: why (and in what respect) does mgpp behave here differently than mg?). but maybe this really is where the problem originates, so i will try to elaborate on that (trace relevant subroutines, print out some variables, ... - it's just pretty time-consuming, so i wanted to ask here first). thanks for your suggestions, anyway! cheers jens _______________________________________________ greenstone-users mailing list email@example.com https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users