On 10Aug2016 1144, Paul Moore wrote:
I presume you'd be targeting 3.7 for this change.
Does 3.6 seem too aggressive? I think I have time to implement the changes before beta 1, as it's mostly changing default values and mopping up resulting breaks. (Doing something like reimplementing files using the Win32 API rather than the CRT would be too big a task for 3.6.)
Most text editors still (AFAIK) use the ANSI codepage by default, and it's the one place where an identifying BOM isn't possible. So your alternative may be a safer choice. On the other hand, files from Unix (via say github) would typically be UTF-8 without BOM, so it becomes a question of choosing the best compromise. I'm inclined to go for cross-platform and UTF-8 and clearly document the change.
That last point was my thinking. Notepad's default is just as bad as Python's default right now, but basically everyone acknowledges that it's bad. I don't think we should prevent Python from behaving better because one Windows tool doesn't.
We might want a more convenient short form for open(filename, "r", encoding=sys.getpreferredencoding()), though, to ease the transition... We'd also need to consider how the new default encoding would interact with PYTHONIOENCODING.
PYTHONIOENCODING doesn't affect locale.getpreferredencoding() (but it does affect sys.std*.encoding).
For the console, does this mean that the win_unicode_console module will no longer be needed when these changes go in?
That's the hope, though that module approaches the solution differently and may still uses. An alternative way for us to fix this whole thing would be to bring win_unicode_console into the standard library and use it by default (or probably whenever PYTHONIOENCODING is not specified).
Sorry, not much in the way of direct experience or information I can add, but a strong +1 on the change (and I'd be happy to help where needed).
Testing with obscure filenames and strings is where help will be needed most :)