[Python-Dev] deleting setdefaultencoding iin site.py is evil
"Martin v. Löwis"
martin at v.loewis.de
Thu Aug 27 08:47:59 CEST 2009
>> The ability to change the default encoding is a misfeature. There's
>> essentially no way to write correct Python code in the presence of
>> this feature.
>
> How so? If every single piece of text in your project is encoded in a
> superset of ascii (such as utf-8), why would this be a problem?
What is "every single piece of text"? Every string occurring in source
code? or also every single string that may be read from a file, a
socket, out of a database, or from a user interface?
How can you be certain that any string is UTF-8 when doing any
reasonable IO?
> Even if you were evil/stupid and mixed encodings, surely all you'd get
> is different unicode errors or mayvbe the odd strange character during
> display?
One specific problem is dictionaries will stop working correctly if you
set the default encoding to anything but ASCII. The reason is that
with UTF-8 as the default encoding, you get
py> u"\u20ac" == u"\u20ac".encode("utf-8")
True
py> hash(u"\u20ac") == hash(u"\u20ac".encode("utf-8"))
False
So objects that compare equal will not hash equal. As a consequence, you
may have two different values for what should be the same key in a
dictionary.
> Well, flipping that giant switch has worked in production for the past 5
> years, so I'm afraid I'll respectfully disagree. I'd suspect the
> pragmatics of real world software are with that function even exists,
> and it's extremely useful when used correctly...
It has worked in your application. See my example above: it is very easy
to create applications that stop working correctly if you use
setdefaultencoding (at all - the only supported value is "latin-1",
since Unicode strings hash the same as byte strings if all characters
are in row 0).
Regards,
Martin
More information about the Python-Dev
mailing list