[Twisted-Python] Changing supported configurations regarding Unicode handling on Windows
![](https://secure.gravatar.com/avatar/3d37232726396a1d3c7412dd915095ea.jpg?s=120&d=mm&r=g)
Hi all, The past week or so, I noticed failures in the Azure Pipelines CI (see https://github.com/twisted/twisted/pull/1278 for the ticket with them, among others) that were due to Python + Windows falling apart on mgorny's name. After some debugging, I ascertained: - The environment has Unicode strings in it (because environments are Unicode on Windows) - but sys.stdout.encoding is cp1252 -- https://www.python.org/dev/peps/pep-0528/ does not apply due to it being a non-interactive console - One of the characters in the environment is not printable under cp1252, which causes an exception. I think we should avoid running under ANSI-mode by default at all costs, since it causes non-obvious bugs like this (`print(os.environ)` causing an exception). This would also bring Windows in line with UNIX, where we basically assume a non-UTF-8 locale is more or less broken by design and we don't run the tests on it. It also seems like Windows is heading in the direction of having console output be CP65001 (aka UTF-8), so I think this is a reasonable direction to go in as well. [1] [2] [3] PEP-528 makes sys.stdout/sys.stdin use the W ("wide", aka UTF-16LE) APIs, as it's assumed that a human is on the other side of the console. For compatibility, it will encode Unicode to UTF-8, pass it to WindowsConsoleIO, which will then decode it into UTF-16 and pass it to the console, meaning that writing raw UTF-8 bytes to sys.stdout.buffer works as you'd expect on Windows and UNIXes. We can enable UTF-8 text output universally with the environment variable `PYTHONIOENCODING=utf8:surrogateescape`. If a user wants ANSI output, they can use the "PYTHONLEGACYWINDOWSSTDIO" environment to make Python not perform the Unicode conversions for the console, so we could perhaps use this too, if someone is SURE they want ANSI output. Python 3.7 has PEP-540's `-X utf8` mode, which also does this, more or less, but in a nicer way (no environment variables). Python 3.5 doesn't seem to work with either of these options. Not sure why. Maybe it's busted. So, due to this, I would like to propose the following: - On Windows, raising a deprecation warning when sys.stdout and sys.stderr are not UTF-8 AND the environment variable "PYTHONLEGACYWINDOWSSTDIO" is not set. - Declaring said environments unsupported and running our tests with -X utf8/PYTHONIOENCODING=utf8 or PYTHONLEGACYWINDOWSSTDIO (which will require some Unicode tests which fail because CP1252 is bad to be skipped). - After the deprecation period, start issuing loud RuntimeWarnings saying that you're probably not doing the thing you want to be doing. Opinions? - Amber [1] https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-... [2] https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.default?vie... [3] https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-cod...
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
...
Opinions?
This all sounds like a pretty good plan to me - please go ahead and do it ASAP! My only concern here is that setting up Twisted services on Windows can already be a bit fiddly, and I would almost rather generate potential mojibake by default than fail to run in a way which would be even harder to debug than it already is. However, even if this is a valid concern, let's not block on it, but figure out a way to fix it after the immediate issue where folks with non-ascii letters in their names can't submit PRs. Thanks so much for investigating! -g
![](https://secure.gravatar.com/avatar/cf223b7cf77583c0a2665bad01f84f11.jpg?s=120&d=mm&r=g)
On Monday, 22 June 2020 08:06:22 BST Glyph wrote:
I think this is down to a change in Python 3.8 on Windows that defaults opening files to use the Windows code page of the user, like cp1252. My guess is that there is a file being opened without an encoding='utf-8'. (I noticed this with modulefinder and fixed that for python 3.8) Barry
![](https://secure.gravatar.com/avatar/469df05f5dfd5b75fb3cb3a0868d36bf.jpg?s=120&d=mm&r=g)
On Mon, Jun 22, 2020 at 12:08 AM Glyph <glyph@twistedmatrix.com> wrote:
I worked with Michał Górny who helped me debug this. When Michał submits a PR from his account https://github.com/mgorny, in the Azure Pipeline, the following environment variable gets set on the Windows builder running in Azure: BUILD_SOURCEVERSIONAUTHOR Michał Górny The presence of this single environment variable caused all sorts of CI failures. Even if Michał submitted a trivial linespace change, the CI would fail due to that environment variable. I submitted this PR which fixes things: https://github.com/twisted/twisted/pull/1302 I was able to run that same patch under a PR created by Michał, and all the CI passed. -- Craig
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
...
Opinions?
This all sounds like a pretty good plan to me - please go ahead and do it ASAP! My only concern here is that setting up Twisted services on Windows can already be a bit fiddly, and I would almost rather generate potential mojibake by default than fail to run in a way which would be even harder to debug than it already is. However, even if this is a valid concern, let's not block on it, but figure out a way to fix it after the immediate issue where folks with non-ascii letters in their names can't submit PRs. Thanks so much for investigating! -g
![](https://secure.gravatar.com/avatar/cf223b7cf77583c0a2665bad01f84f11.jpg?s=120&d=mm&r=g)
On Monday, 22 June 2020 08:06:22 BST Glyph wrote:
I think this is down to a change in Python 3.8 on Windows that defaults opening files to use the Windows code page of the user, like cp1252. My guess is that there is a file being opened without an encoding='utf-8'. (I noticed this with modulefinder and fixed that for python 3.8) Barry
![](https://secure.gravatar.com/avatar/469df05f5dfd5b75fb3cb3a0868d36bf.jpg?s=120&d=mm&r=g)
On Mon, Jun 22, 2020 at 12:08 AM Glyph <glyph@twistedmatrix.com> wrote:
I worked with Michał Górny who helped me debug this. When Michał submits a PR from his account https://github.com/mgorny, in the Azure Pipeline, the following environment variable gets set on the Windows builder running in Azure: BUILD_SOURCEVERSIONAUTHOR Michał Górny The presence of this single environment variable caused all sorts of CI failures. Even if Michał submitted a trivial linespace change, the CI would fail due to that environment variable. I submitted this PR which fixes things: https://github.com/twisted/twisted/pull/1302 I was able to run that same patch under a PR created by Michał, and all the CI passed. -- Craig
participants (4)
-
Amber Brown (hawkowl)
-
Barry Scott
-
Craig Rodrigues
-
Glyph