[Python-Dev] deleting setdefaultencoding iin site.py is evil
"Martin v. Löwis"
martin at v.loewis.de
Thu Aug 27 08:47:59 CEST 2009
>> The ability to change the default encoding is a misfeature. There's
>> essentially no way to write correct Python code in the presence of
>> this feature.
> How so? If every single piece of text in your project is encoded in a
> superset of ascii (such as utf-8), why would this be a problem?
What is "every single piece of text"? Every string occurring in source
code? or also every single string that may be read from a file, a
socket, out of a database, or from a user interface?
How can you be certain that any string is UTF-8 when doing any
> Even if you were evil/stupid and mixed encodings, surely all you'd get
> is different unicode errors or mayvbe the odd strange character during
One specific problem is dictionaries will stop working correctly if you
set the default encoding to anything but ASCII. The reason is that
with UTF-8 as the default encoding, you get
py> u"\u20ac" == u"\u20ac".encode("utf-8")
py> hash(u"\u20ac") == hash(u"\u20ac".encode("utf-8"))
So objects that compare equal will not hash equal. As a consequence, you
may have two different values for what should be the same key in a
> Well, flipping that giant switch has worked in production for the past 5
> years, so I'm afraid I'll respectfully disagree. I'd suspect the
> pragmatics of real world software are with that function even exists,
> and it's extremely useful when used correctly...
It has worked in your application. See my example above: it is very easy
to create applications that stop working correctly if you use
setdefaultencoding (at all - the only supported value is "latin-1",
since Unicode strings hash the same as byte strings if all characters
are in row 0).
More information about the Python-Dev