Re: [Python-Dev] deleting setdefaultencoding iin site.py is evil

27 Aug 2009

      ...
...
The ability to change the default encoding is a misfeature.  There's
essentially no way to write correct Python code in the presence of
this feature.
How so? If every single piece of text in your project is encoded in a
superset of ascii (such as utf-8), why would this be a problem?
What is "every single piece of text"? Every string occurring in source
code? or also every single string that may be read from a file, a
socket, out of a database, or from a user interface?

How can you be certain that any string is UTF-8 when doing any
reasonable IO?
...
Even if you were evil/stupid and mixed encodings, surely all you'd get
is different unicode errors or mayvbe the odd strange character during
display?
One specific problem is dictionaries will stop working correctly if you
set the default encoding to anything but ASCII. The reason is that
with UTF-8 as the default encoding, you get

py> u"\u20ac" == u"\u20ac".encode("utf-8")
True
py> hash(u"\u20ac") == hash(u"\u20ac".encode("utf-8"))
False

So objects that compare equal will not hash equal. As a consequence, you
may have two different values for what should be the same key in a
dictionary.
...
Well, flipping that giant switch has worked in production for the past 5
years, so I'm afraid I'll respectfully disagree. I'd suspect the
pragmatics of real world software are with that function even exists,
and it's extremely useful when used correctly...
It has worked in your application. See my example above: it is very easy
to create applications that stop working correctly if you use
setdefaultencoding (at all - the only supported value is "latin-1",
since Unicode strings hash the same as byte strings if all characters
are in row 0).

Regards,
Martin

Re: [Python-Dev] deleting setdefaultencoding iin site.py is evil

"Martin v. Löwis"