[Tutor] unichr not working as expected

Steven D'Aprano steve at pearwood.info
Tue Jul 23 05:20:49 CEST 2013


On 23/07/13 05:22, Jim Mooney wrote:

> I already changed to u for the char, so I got a bigger number, and only
> subtracted 3 from umlaut, which should have given me the dos line-drawing
> dash, but now my problem is I can't seem to set encoding for that:
>
> import sys
> sys.setdefaultencoding('cp437')
>
> gives me the error:
>
> AttributeError: 'module' object has no attribute 'setdefaultencoding'


Don't touch setdefaultencoding. It is hidden for a reason. And if you insist on living dangerously, don't set it to weird legacy encodings like cp437.

When Python starts up, it needs to set the encoding used, but you *cannot* set it to arbitrary encodings. Setting it to arbitrary encodings can cause all sorts of weird, hard to diagnose bugs, so to prevent that, Python deletes the setdefaultencoding function after using it.

The documentation is clear that there are no user-serviceable parts here:

http://docs.python.org/2/library/sys.html#sys.setdefaultencoding

And in Python 3 it became a no-op, then finally deleted for good, gone forever, and thanks be to feck. http://bugs.python.org/issue9549

Apparently it only exists because when Unicode was first introduced to Python, the developers couldn't decide whether to use ASCII, Latin1 or UTF-8 as the internal encoding, which just goes to show that even the top Python devs can be foolish when it comes to Unicode. So they put in an experimental function to set the default encoding, and *literally forgot to remove it* for the public release.

(That's according to the Effbot, Fredrik Lundh, one of the early Python luminaries.)



-- 
Steven


More information about the Tutor mailing list