[Python-Dev] Support of UTF-16 and UTF-32 source encodings

Sat Nov 14 21:57:51 EST 2015

On Sat, Nov 14, 2015 at 7:06 PM, Steve Dower <steve.dower at python.org> wrote:
> The native encoding on Windows has been UTF-16 since Windows NT. Obviously
> we've survived without Python tokenization support for a long time, but
> every API uses it.

Windows 2000 was the first version to have broad support for UTF-16.
Windows NT (1993) was released before UTF-16, so its Unicode support
is limited to UCS-2.

(Note that console windows still restrict each character cell to a
single WCHAR character. So a non-BMP character encoded as a UTF-16
surrogate pair always appears as two box glyphs. Of course you can
copy and paste from the console to a UTF-16 aware window just fine.)

> I've hit a few cases where it would have been handy for Python to be able to
> detect it, though nothing I couldn't work around.

Can you elaborate some example cases? I can see using UTF-16 for the
REPL in the Windows console, but a hypothetical WinConIO class could
simply transcode to and from UTF-8. Drekin's win-unicode-console
package works like this.