[issue6058] Add cp65001 to encodings/aliases.py
report at bugs.python.org
Mon Nov 8 05:11:52 CET 2010
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
Different tests proved that cp65001 can *not* be set as an alias to utf-8, and that's why I'm closing this issue.
Anyway, I don't think that cp65001 is configured by default on any Windows setup. It is only set by the user, using the chcp command, to try to display unicode characters in the Windows console: but it is not possible to display any unicode character in this console (see issue #1602). And chcp command should not be used in the Windows console because it does not only change the ANSI code page: it changes also the console code page, which is wrong (the console still expect text encoded to the previous code page).
It is possible to implement a codec for cp65001 using utf-8 existing codec in surrogatepass mode, or by using MultiByteToWideChar() / WideCharToMultiByte() with codepage=CP_UTF8. But I don't think that we need cp65001 at all.
If you need cp65001 for a good reason and you would like to implement a cp65001 Python codec, open a new issue.
If you consider that we should use _O_U8TEXT or _O_U16TEXT, open another new issue.
_O_U8TEXT or _O_U16TEXT might improve unicode support if Python output is redirected to a pipe, but I don't think that it would help to display unicode character in the Windows console. I also fear that it breaks existing code and any function not aware of this special mode.
resolution: -> invalid
status: open -> closed
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list