On Thu, Aug 11, 2016, at 10:25, Steven D'Aprano wrote:
Unless someone else does the implementation, I'd rather add a utf8-readsig encoding that initially only skips a utf8 BOM - notably, you always get the same encoding, it just sometimes skips the first three bytes.
I think we can change this later to detect and switch to utf16 without it being disastrous, though we've made it this far without it and frankly there are good reasons to "encourage" utf8 over utf16.
My big concern is the console... I think that change is inevitably going to have to break someone, but I need to map out the possibilities first to figure out just how bad it'll be.
Top-posted from my Windows Phone
> > Interesting. Are you assuming that a text file cannot be empty?
> Hmmm... not consciously, but I guess I was.
> If the file is empty, how do you know it's text?
Heh. That's the *other* thing that Notepad does wrong in the opinion of
people coming from the Unix world - a Windows text file does not need to
end with a [CR]LF, and normally will not.
> But we're getting off topic here. In context of Steve's suggestion, we
> should only autodetect UTF-8. In other words, if there's a UTF-8 BOM,
> skip it, otherwise treat the file as UTF-8.
I think there's still room for UTF-16. It's two of the four encodings
supported by Notepad, after all.
Python-ideas mailing list
Code of Conduct: http://python.org/psf/codeofconduct/