[issue13913] utf-8 or utf8 or utf-8 (codec display name inconsistency)
Kang-Hao (Kenny) Lu
report at bugs.python.org
Tue Jan 31 18:27:56 CET 2012
New submission from Kang-Hao (Kenny) Lu <kennyluck at csail.mit.edu>:
Since Python 3.2.2 (I don't have earlier version to test with),
UnicodeEncodeError: *utf-8* codec can't encode character '\udc80'...
UnicodeDecodeError: *utf8* codec can't decode byte 0xff in position 0
and the table on the documentation of the codec module suggests *utf_8* as the name of the codec, which I believe to be equivalent to "utf_8" because '-' is not a valid character of an identifier.
Can we at least make the above two consistent? I would go for "utf-8", which was probably introduced for rejecting surrogates, but "utf8" has been there for years. What do we do? I am happy to submit patches for all branches. These are one-liners anyway.
The backward compatibility risk should be pretty low as usually you don't get encoding from these errors and I don't see any use of PyUnicode(Encode|Decode)Error_GetEncoding in trunk, although I'm using it for issue #12892.
Also, "latin_1" displays as *latin-1* but "iso2022-jp" displays as *iso2022_jp*. I care less about this nit though.
nosy: ezio.melotti, kennyluck
title: utf-8 or utf8 or utf-8 (codec display name inconsistency)
versions: Python 2.7, Python 3.2, Python 3.3
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list