Unless someone else does the implementation, I'd rather add a utf8-readsig encoding that initially only skips a utf8 BOM - notably, you always get the same encoding, it just sometimes skips the first three bytes.

I think we can change this later to detect and switch to utf16 without it being disastrous, though we've made it this far without it and frankly there are good reasons to "encourage" utf8 over utf16.

My big concern is the console... I think that change is inevitably going to have to break someone, but I need to map out the possibilities first to figure out just how bad it'll be.

Top-posted from my Windows Phone

From: Random832
Sent: ‎8/‎11/‎2016 7:54
To: python-ideas@python.org
Subject: Re: [Python-ideas] Fix default encodings on Windows

On Thu, Aug 11, 2016, at 10:25, Steven D'Aprano wrote:
> > Interesting. Are you assuming that a text file cannot be empty?
>
> Hmmm... not consciously, but I guess I was.
>
> If the file is empty, how do you know it's text?

Heh. That's the *other* thing that Notepad does wrong in the opinion of
people coming from the Unix world - a Windows text file does not need to
end with a [CR]LF, and normally will not.

> But we're getting off topic here. In context of Steve's suggestion, we
> should only autodetect UTF-8. In other words, if there's a UTF-8 BOM,
> skip it, otherwise treat the file as UTF-8.

I think there's still room for UTF-16. It's two of the four encodings
supported by Notepad, after all.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/