On 10 August 2016 at 19:10, Steve Dower <steve.dower@python.org> wrote:
To summarise the proposals (remembering that these would only affect Python 3.6 on Windows):
* change sys.getfilesystemencoding() to return 'utf-8' * automatically decode byte paths assuming they are utf-8 * remove the deprecation warning on byte paths * make the default open() encoding check for a BOM or else use utf-8 * [ALTERNATIVE] make the default open() encoding check for a BOM or else use sys.getpreferredencoding() * force the console encoding to UTF-8 on initialize and revert on finalize
So what are your concerns? Suggestions?
I presume you'd be targeting 3.7 for this change. Broadly, I'm +1 on all of this. Personally, I'm moving to UTF-8 everywhere, so it seems OK to me, but I suspect defaulting open() to UTF-8 in the absence of a BOM might cause issues for people. Most text editors still (AFAIK) use the ANSI codepage by default, and it's the one place where an identifying BOM isn't possible. So your alternative may be a safer choice. On the other hand, files from Unix (via say github) would typically be UTF-8 without BOM, so it becomes a question of choosing the best compromise. I'm inclined to go for cross-platform and UTF-8 and clearly document the change. We might want a more convenient short form for open(filename, "r", encoding=sys.getpreferredencoding()), though, to ease the transition... We'd also need to consider how the new default encoding would interact with PYTHONIOENCODING. For the console, does this mean that the win_unicode_console module will no longer be needed when these changes go in? Sorry, not much in the way of direct experience or information I can add, but a strong +1 on the change (and I'd be happy to help where needed). Paul