2016-08-16 8:06 GMT+02:00 eryk sun
My proposal was to use the wide-character APIs, but transcoding CP_ACP without best-fit characters and raising a warning whenever the default character is used (e.g. substituting Katakana middle dot when creating a file using a bytes path that has an invalid sequence in CP932).
A problem with all these proposal is that they *add* new code to the CPython code base, code specific to Windows. There are very few core developers (1 or 2?) who work on this code specific to Windows. I would prefer to *drop* code specific to Windows rather that *adding* (or changing) code specific to Windows, just to make the CPython code base simpler to maintain. It's already annoying enough. It's common that a Python function has one implementation for all platforms except Windows, and a second implementation specific to Windows. An example: os.listdir() * ~150 lines of C code for the Windows implementation * ~100 lines of C code for the UNIX/BSD implementation * The Windows implementation is splitted in two parts: Unicode and bytes, so the code is basically duplicated (2 versions) If you remove the bytes support, the Windows function is reduced to 100 lines (-50). I'm not sure that modifying the API using byte would solve any issue on Windows, and there is an obvious risk of regression (mojibake when you concatenerate strings encoded to UTF-8 and to ANSI code page). I'm in favor of forcing developers to use Unicode on Windows, which is the correct way to use the Windows API. The side effect is that such code works perfectly well on UNIX/BSD ;-) To be clear: drop the deprecated code to support bytes on Windows. I already proposed to drop bytes support on Windows and most answers were "please keep them", so another option is to keep the "broken code" as the status quo... I really hate APIs using bytes on Windows because they use WideCharToMultiByte() (encode unicode to bytes) in a mode which is likely to lead to mojibake: unencodable characters are replaced with "best fit characters" or "?". https://unicodebook.readthedocs.io/operating_systems.html#encode-and-decode-... In a perfect world, I would also propose to deprecate bytes filenames on UNIX, but I expect an insane flamewar on the definition of "UNIX", history of UNIX, etc. (non technical discussion, since Unicode works very well on Python 3...). Victor