On 2/11/21, Inada Naoki email@example.com wrote:
There is little difference between `encoding=None` and `encoding=locale.getpreferredencoding(False)`. The difference is:
- When Python is using Windows, and
- When when the file is console, and
- (for open()) When PYTHONLEGACYWINDOWSSTDIO is set
- (for TextIOWrapper()) When the file is not _WindowsConsoleIO
encoding=None uses console codepage but
os.device_encoding() -- i.e. _Py_device_encoding() -- only works for hard-coded file descriptors 0, 1, and 2, instead of detecting a console file. So opening "CON", "CONIN$", or "CONOUT$" has never used the console input or output code page, nor has opening a duped standard I/O fd such as open(os.dup(0)). It would be easy to generalize _Py_device_encoding() to detect console files, but it's new behavior.
Python 3.8+ introduced a bug (issue 42261) in which, even with legacy standard I/O enabled and file descriptors 0-2, the console input and output code pages are ignored. For example:
C:>chcp 437 Active code page: 437 C:>set PYTHONLEGACYWINDOWSSTDIO=1 C:>py -3.9 -c "import sys; print(sys.stdout.encoding)" cp1252
Regarding the last bullet point, io.TextIOWrapper doesn't know anything about io._WindowsConsoleIO. The decision to use UTF-8 is in io.open(). So manually wrapping a _WindowsConsoleIO file with TextIOWrapper uses the locale preferred encoding instead of UTF-8. For example:
>>> fb = open('conin$', 'rb') >>> fb.raw <_io._WindowsConsoleIO mode='rb' closefd=True> >>> f = io.TextIOWrapper(fb) >>> f.encoding 'cp1252'
I don't know whether it's worth making TextIOWrapper check for _WindowsConsoleIO in order to make it use UTF-8. It's not common to manually wrap a binary-mode file.