<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body><div><div style="font-family: Calibri,sans-serif; font-size: 11pt;">The native encoding on Windows has been UTF-16 since Windows NT. Obviously we've survived without Python tokenization support for a long time, but every API uses it.<br><br>I've hit a few cases where it would have been handy for Python to be able to detect it, though nothing I couldn't work around. Saying it is rarely used is rather exposing your own unawareness though - it could arguably be the most commonly used encoding (depending on how you define "used").<br><br>Cheers,<br>Steve<br><br>Top-posted from my Windows Phone</div></div><div dir="ltr"><hr><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">From: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:victor.stinner@gmail.com">Victor Stinner</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Sent: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;">‎11/‎14/‎2015 14:58</span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">To: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:storchaka@gmail.com">Serhiy Storchaka</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Cc: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:python-dev@python.org">python-dev@python.org</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Subject: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;">Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings</span><br><br></div><p dir="ltr">These encodings are rarely used. I don't think that any text editor use them. Editors use ascii, latin1, utf8 and... all locale encoding. But I don't know any OS using UTF-16 as a locale encoding. UTF-32 wastes disk space.</p>

<p dir="ltr">Ok, even if it exists, Python already accepts a very wide range of encoding. It is not worth to make the parser much more complex just to support encodings which are also never used (for .py files).</p>

<p dir="ltr">Victor<br>

</p>

<div class="gmail_quote">Le 14 nov. 2015 20:20, "Serhiy Storchaka" <<a href="mailto:storchaka@gmail.com">storchaka@gmail.com</a>> a écrit :<br type="attribution"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; padding-left: 1ex; border-left-color: rgb(204, 204, 204); border-left-width: 1px; border-left-style: solid;">For now UTF-16 and UTF-32 source encodings are not supported. There is a comment in Parser/tokenizer.c:<br>

<br>

    /* Disable support for UTF-16 BOMs until a decision<br>

       is made whether this needs to be supported.  */<br>

<br>

Can we make a decision whether this support will be added in foreseeable future (say in near 10 years), or no?<br>

<br>

Removing commented out and related code will help to refactor the tokenizer, and that can help to fix some existing bugs (e.g. issue14811, issue18961, issue20115 and may be others). Current tokenizing code is too tangled.<br>

<br>

If the support of UTF-16 and UTF-32 is planned, I'll take this to attention during refactoring. But in many places besides the tokenizer the ASCII compatible encoding of source files is expected.<br>

<br>

_______________________________________________<br>

Python-Dev mailing list<br>

<a href="mailto:Python-Dev@python.org" target="_blank">Python-Dev@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-dev" target="_blank" rel="noreferrer">https://mail.python.org/mailman/listinfo/python-dev</a><br>

Unsubscribe: <a href="https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com" target="_blank" rel="noreferrer">https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com</a><br>

</blockquote></div>

</body></html>