Re: [Python-ideas] Fix default encodings on Windows

10 Aug 2016

      On Wed, Aug 10, 2016, at 14:10, Steve Dower wrote:
...
To summarise the proposals (remembering that these would only affect 
Python 3.6 on Windows):
* change sys.getfilesystemencoding() to return 'utf-8'
* automatically decode byte paths assuming they are utf-8
* remove the deprecation warning on byte paths
Why? What's the use case?
...
* make the default open() encoding check for a BOM or else use utf-8
* [ALTERNATIVE] make the default open() encoding check for a BOM or else 
use sys.getpreferredencoding()
For reading, I assume. When opened for writing, it should probably be
utf-8-sig [if it's not mbcs] to match what Notepad does. What about
files opened for appending or updating? In theory it could ingest the
whole file to see if it's valid UTF-8, but that has a time cost. 

Notepad, if there's no BOM, checks the first 256 bytes of the file for
whether it's likely to be utf-16 or mbcs [utf-8 isn't considered AFAIK],
and can get it wrong for certain very short files [i.e. the infamous
"this app can break"]

What to do on opening a pipe or device? [Is os.fstat able to detect
these cases?]

Maybe the BOM detection phase should be deferred until the first read.
What should encoding be at that point if this is done? Is there a
"utf-any" encoding that can handle all five BOMs? If not, should there
be? how are "utf-16" and "utf-32" files opened for appending or updating
handled today?
...
* force the console encoding to UTF-8 on initialize and revert on
finalize
Why not implement a true unicode console? What if sys.stdin/stdout are
pipes (or non-console devices such as a serial port)?

Re: [Python-ideas] Fix default encodings on Windows

Random832