[Python-Dev] Support of UTF-16 and UTF-32 source encodings

Serhiy Storchaka storchaka at gmail.com
Sat Nov 14 14:19:37 EST 2015


For now UTF-16 and UTF-32 source encodings are not supported. There is a 
comment in Parser/tokenizer.c:

     /* Disable support for UTF-16 BOMs until a decision
        is made whether this needs to be supported.  */

Can we make a decision whether this support will be added in foreseeable 
future (say in near 10 years), or no?

Removing commented out and related code will help to refactor the 
tokenizer, and that can help to fix some existing bugs (e.g. issue14811, 
issue18961, issue20115 and may be others). Current tokenizing code is 
too tangled.

If the support of UTF-16 and UTF-32 is planned, I'll take this to 
attention during refactoring. But in many places besides the tokenizer 
the ASCII compatible encoding of source files is expected.



More information about the Python-Dev mailing list