[New-bugs-announce] [issue45176] Many regtest failures on Windows with non-ASCII account name

Ming Hua report at bugs.python.org
Sun Sep 12 05:50:10 EDT 2021

New submission from Ming Hua <plateauwolf at qq.com>:

Since at least Windows 8, it is possible to invoke the input method engine (IME) when installing Windows and creating accounts.  So at least among simplified Chinese users, it's not uncommon to have a Chinese account name.

After successful installation using the 64-bit .exe installer for Windows, just to be paranoid (and to get familiar with Python's test framework), I decided to run the bundled regression tests.  To my surprise I got many failures.  The following is the summary of "python.exe -m test" with 3.8 some months ago (likely 3.8.6):

371 tests OK.

11 tests failed:
    test_cmd_line_script test_compileall test_distutils test_doctest
    test_locale test_mimetypes test_py_compile test_tabnanny
    test_urllib test_venv test_zipimport_support

43 tests skipped:
    test_asdl_parser test_check_c_globals test_clinic test_curses
    test_dbm_gnu test_dbm_ndbm test_devpoll test_epoll test_fcntl
    test_fork1 test_gdb test_grp test_ioctl test_kqueue
    test_multiprocessing_fork test_multiprocessing_forkserver test_nis
    test_openpty test_ossaudiodev test_pipes test_poll test_posix
    test_pty test_pwd test_readline test_resource test_smtpnet
    test_socketserver test_spwd test_syslog test_threadsignals
    test_timeout test_tix test_tk test_ttk_guionly test_urllib2net
    test_urllibnet test_wait3 test_wait4 test_winsound test_xmlrpc_net
    test_xxtestfuzz test_zipfile64

Total duration: 59 min 49 sec
Tests result: FAILURE

The failures all look similar though, it seems Python on Windows assumes the home directory of the user, "C:\Users\<username>\", is either in ASCII or UTF-8 encoding, while it is actually in Windows native codepage, in my case cp936 for simplified Chinese (zh-CN).

To take a couple of examples (these are from recent testing with 3.10.0 rc2):

> python.exe -m test -W test_cmd_line_script
0:00:03 Run tests sequentially
0:00:03 [1/1] test_cmd_line_script
test_consistent_sys_path_for_direct_execution (test.test_cmd_line_script.CmdLineTest) ... ERROR
test_directory_error (test.test_cmd_line_script.CmdLineTest) ... FAIL
ERROR: test_consistent_sys_path_for_direct_execution (test.test_cmd_line_script.CmdLineTest)
Traceback (most recent call last):
  File "C:\Programs\Python\python310\lib\test\test_cmd_line_script.py", line 677, in test_consistent_sys_path_for_direct_execution
    out_by_name = kill_python(p).decode().splitlines()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 9: invalid start byte
FAIL: test_directory_error (test.test_cmd_line_script.CmdLineTest)
Traceback (most recent call last):
  File "C:\Programs\Python\python310\lib\test\test_cmd_line_script.py", line 268, in test_directory_error
    self._check_import_error(script_dir, msg)
  File "C:\Programs\Python\python310\lib\test\test_cmd_line_script.py", line 151, in _check_import_error
    self.assertIn(expected_msg.encode('utf-8'), err)
AssertionError: b"can't find '__main__' module in 'C:\\\\Users\\\\\xe5<5 bytes redacted>\\\\AppData\\\\Local\\\\Temp\\\\tmpcwkfn9ct'" not found in b"C:\\Programs\\Python\\python310\\python.exe: can't find '__main__' module in 'C:\\\\Users\\\\\xbb<3 bytes redacted>\\\\AppData\\\\Local\\\\Temp\\\\tmpcwkfn9ct'\r\n"
Ran 44 tests in 29.769s

FAILED (failures=2, errors=5)
test test_cmd_line_script failed
test_cmd_line_script failed (5 errors, 2 failures) in 30.4 sec

== Tests result: FAILURE ==

In the above test_directory_error AssertionError message I redacted part of the path as my account name is my real name.  Hope the issue is clear enough despite the redaction, since the "\xe5<5 bytes redacted>" part is 6 bytes and apparently in UTF-8 (for two Chinese characters) and the "\xbb<3 bytes redacted>" part is 4 bytes and apparently in cp936.

As I've said above, I discovered this issue some time ago, but only have time now to report it.  I believe I've see these failures in 3.8.2/6, 3.9.7, and 3.10.0 rc2.  It shouldn't be hard to reproduce for people with ways to create account with non-ASCII name on Windows.  If reproducing turns out to be difficult though, I'm happy to provide more information and/or run more tests.

components: Tests
messages: 401659
nosy: minghua
priority: normal
severity: normal
status: open
title: Many regtest failures on Windows with non-ASCII account name
versions: Python 3.10, Python 3.8, Python 3.9

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list