[Python-ideas] Fix default encodings on Windows

Random832 random832 at fastmail.com
Wed Aug 10 19:30:41 EDT 2016


On Wed, Aug 10, 2016, at 19:04, eryk sun wrote:
> Using 'mbcs' doesn't work reliably with arbitrary bytes paths in
> locales that use a DBCS codepage such as 932.

Er... utf-8 doesn't work reliably with arbitrary bytes paths either,
unless you intend to use surrogateescape (which you could also do with
mbcs).

Is there any particular reason to expect all bytes paths in this
scenario to be valid UTF-8?

> Python 3 uses O_BINARY when opening files, unless you explicitly call
> os.open. Specifically, FileIO.__init__ adds O_BINARY to the open flags
> if the platform defines it.

Fair enough. I wasn't sure, particularly considering that python does
expose O_BINARY, O_TEXT, and msvcrt.setmode.

I'm not sure I approve of os.open not also adding it (or perhaps adding
it only if O_TEXT is not explicitly added), but... meh.

> Python could copy how
> configure_text_mode() handles the BOM, except it shouldn't write a BOM
> for new UTF-8 files.

I disagree. I think that *on windows* it should, just like *on windows*
it should write CR-LF for line endings.


More information about the Python-ideas mailing list