Re: [Python-Dev] Internationalization Toolkit

You beat me to it - a colleague and I were just discussing this verbally. Specifically we Brits will get annoyed as soon as we read in a text file with pound (sterling) signs. We concluded that the only reasonable default (if you have one at all) is pure ASCII. At least that way I will get a clear and intelligible warning when I load in such a file, and will remember to specify ISO-Latin-1. - Andy ===== Andy Robinson Robinson Analytics Ltd. ------------------ My opinions are the official policy of Robinson Analytics Ltd. They just vary from day to day. __________________________________________________ Do You Yahoo!? Bid and sell for free at http://auctions.yahoo.com

Andy Robinson wrote:
Well, Guido's post made me rethink the approach... 1. Setting <default encoding> to any non UTF encoding will result in data lossage due to the encoding limits imposed by the other formats -- this is dangerous and will result in errors (some of which may not even be noticed due to the interpreter ignoring them) in case your strings use non encodable characters. 2. You basically only want to set <default encoding> to anything other than UTF-8 for stream input and output. This can be done using the unicodec stream wrapper without too much inconvenience. (We'll have to extend the wrapper a little, though, because it currently only accept Unicode objects for writing and always return Unicode object when reading.) 3. We should leave the issue open until some code is there to be tested... I have a feeling that there will be quite a few strange effects when APIs expecting strings are fed with Unicode objects returning UTF-8. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 50 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Andy Robinson wrote:
Well, Guido's post made me rethink the approach... 1. Setting <default encoding> to any non UTF encoding will result in data lossage due to the encoding limits imposed by the other formats -- this is dangerous and will result in errors (some of which may not even be noticed due to the interpreter ignoring them) in case your strings use non encodable characters. 2. You basically only want to set <default encoding> to anything other than UTF-8 for stream input and output. This can be done using the unicodec stream wrapper without too much inconvenience. (We'll have to extend the wrapper a little, though, because it currently only accept Unicode objects for writing and always return Unicode object when reading.) 3. We should leave the issue open until some code is there to be tested... I have a feeling that there will be quite a few strange effects when APIs expecting strings are fed with Unicode objects returning UTF-8. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 50 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (2)
-
Andy Robinson
-
M.-A. Lemburg