Nick Coghlan writes:
At an ecosystem level, that means we're faced with a choice between implicitly encouraging folks to make their code *nix only, and finding a way to provide a more *nix like experience when running on Windows (where UTF-8 encoded binary data just works, and either other encodings lead to mojibake or else you use chardet to figure things out).
Most of the time we do know what the encoding is, we can just ask Windows (although Steve proposes to make Python fib about that, we could add other APIs). This change means that programs that until now could be encoding- agnostic and just pass around bytes on Windows, counting on Python to consistently convert those to the appropriate form for the API, can't do that any more. They have to find out what the encoding is, and transcode to UTF-8, or rewrite to do their processing as text. This is a potential burden on existing user code. I suppose that there are such programs, for the same reasons that networking programs tend to use bytes I/O: ports from Python 2, an (misplaced?) emphasis on performance, etc.
Steve is suggesting that the latter option is preferable, a view I agree with since it lowers barriers to entry for Windows based developers to contribute to primarily *nix focused projects.
Sure, but do you have any idea what the costs might be? Aside from the code burden mentioned above, there's a reputational issue. Just yesterday I was having a (good-natured) Perl vs. Python discussion on my LUG ML, and two developers volunteered that they avoid Python because "the Python developers frequently break backward compatibility". These memes tend to go off on their own anyway, but this change will really feed that one.
Promoting cross-platform consistency often leads to enabling patterns that are considered a bad idea from a native platform perspective, and this strikes me as an example of that (just as the binary/text separation itself is a case where Python 3 diverged from the POSIX text model to improve consistency across *nix, Windows, JVM and CLR environments).
I would say rather Python 3 chose an across-the-board better, more robust model supporting internationalization and multilingualization properly. POSIX's text model is suitable at best for a fragile localization. This change, OTOH, is a step backward we wouldn't consider except for the intended effect on ease of writing networking code. That's important, but I really don't think that's going to be the only major effect, and I fear it won't be the most important effect. Of course that's FUD -- I have no data on potential burden to existing use cases, or harm to reputation. But neither do you and Steve. :-(