On Wed, Aug 10, 2016, at 14:10, Steve Dower wrote:
To summarise the proposals (remembering that these would only affect Python 3.6 on Windows):
- change sys.getfilesystemencoding() to return 'utf-8'
- automatically decode byte paths assuming they are utf-8
- remove the deprecation warning on byte paths
Why? What's the use case?
- make the default open() encoding check for a BOM or else use utf-8
- [ALTERNATIVE] make the default open() encoding check for a BOM or else
For reading, I assume. When opened for writing, it should probably be utf-8-sig [if it's not mbcs] to match what Notepad does. What about files opened for appending or updating? In theory it could ingest the whole file to see if it's valid UTF-8, but that has a time cost.
Notepad, if there's no BOM, checks the first 256 bytes of the file for whether it's likely to be utf-16 or mbcs [utf-8 isn't considered AFAIK], and can get it wrong for certain very short files [i.e. the infamous "this app can break"]
What to do on opening a pipe or device? [Is os.fstat able to detect these cases?]
Maybe the BOM detection phase should be deferred until the first read. What should encoding be at that point if this is done? Is there a "utf-any" encoding that can handle all five BOMs? If not, should there be? how are "utf-16" and "utf-32" files opened for appending or updating handled today?
- force the console encoding to UTF-8 on initialize and revert on
Why not implement a true unicode console? What if sys.stdin/stdout are pipes (or non-console devices such as a serial port)?