[Python-Dev] Support of UTF-16 and UTF-32 source encodings

Chris Angelico rosuav at gmail.com
Sat Nov 14 20:15:28 EST 2015


On Sun, Nov 15, 2015 at 12:06 PM, Steve Dower <steve.dower at python.org> wrote:
> The native encoding on Windows has been UTF-16 since Windows NT. Obviously
> we've survived without Python tokenization support for a long time, but
> every API uses it.
>
> I've hit a few cases where it would have been handy for Python to be able to
> detect it, though nothing I couldn't work around. Saying it is rarely used
> is rather exposing your own unawareness though - it could arguably be the
> most commonly used encoding (depending on how you define "used").

What matters here is: How likely is it that an arbitrary Python script
(or, say, "arbitrary text file") is encoded UTF-16 rather than
something ASCII-compatible? I think even Notepad defaults to UTF-8 for
files, now. The fact that it's sending text to the GUI subsystem in
UTF-16 is immaterial here.

Can the py.exe launcher handle a UTF-16 shebang? (I'm pretty sure Unix
program loaders won't.) That alone might be a reason for strongly
encouraging ASCII-compat encodings.

ChrisA


More information about the Python-Dev mailing list