
On Wed, Aug 10, 2016, at 19:04, eryk sun wrote:
Using 'mbcs' doesn't work reliably with arbitrary bytes paths in locales that use a DBCS codepage such as 932.
Er... utf-8 doesn't work reliably with arbitrary bytes paths either, unless you intend to use surrogateescape (which you could also do with mbcs).
Is there any particular reason to expect all bytes paths in this scenario to be valid UTF-8?
Python 3 uses O_BINARY when opening files, unless you explicitly call os.open. Specifically, FileIO.__init__ adds O_BINARY to the open flags if the platform defines it.
Fair enough. I wasn't sure, particularly considering that python does expose O_BINARY, O_TEXT, and msvcrt.setmode.
I'm not sure I approve of os.open not also adding it (or perhaps adding it only if O_TEXT is not explicitly added), but... meh.
Python could copy how configure_text_mode() handles the BOM, except it shouldn't write a BOM for new UTF-8 files.
I disagree. I think that *on windows* it should, just like *on windows* it should write CR-LF for line endings.