There's a long ongoing thread with the subject "Make UTF-8 mode more accessible for Windows users."

There are two obvious problems with UTF-8 mode. First, it applies to entire installations, or at least entire running scripts, including all imported libraries no matter who wrote them, etc., making it a blunt instrument. Second, the problem on Windows isn't that Python happens to use the wrong default encoding, it's that multiple encodings coexist, and you really do have to think each time you en/decode something about which encoding you ought to use. UTF-8 mode doesn't solve that, it just changes the default.

It seems as though most of those commenting in the other thread don't actually use Python on Windows. I do, and I can say it's a royal pain to have to write open(path, encoding='utf-8') all the time. If you could write open(path, 'r', 'utf-8'), that would be slightly better, but the third parameter is buffering, not encoding, and open(path, 'r', -1, 'utf-8') is not very readable.

UTF-8 mode is somehow worse, because you now have to decide between writing open(path), and having your script be incompatible with non-UTF-8 Windows installations, or writing open(path, encoding='utf-8'), making your script more compatible but making UTF-8 mode pointless. There's a constant temptation to sacrifice portability for convenience - a temptation that Unix users are familiar with, since they omit encoding='utf-8' all the time.

My proposal is to add a couple of single-character options to open()'s mode parameter. 'b' and 't' already exist, and the encoding parameter essentially selects subcategories of 't', but it's annoyingly verbose and so people often omit it.

If '8' was equivalent to specifying encoding='UTF-8', and 'L' was equivalent to specifying encoding=(the real locale encoding, ignoring UTF-8 mode), that would go a long way toward making open more convenient in the common cases on Windows, and I bet it would encourage at least some of those developing on Unixy platforms to write more portable code also. For other encodings, you can still use 't' (or '') and the encoding parameter.

Note that I am not suggesting that 'L' be equivalent to PEP 597's encoding='locale', because that's specified to behave the same as encoding=None, except that it suppresses the warning. I think that's a terrible idea, because it means thatĀ open's behavior still depends on the global UTF-8 mode even if you specify the encoding explicitly. This is really a criticism of PEP 597 and not a part of this proposal as such. I think UTF-8 mode was a bad idea (just like a global "binary mode" that interpreted every mode='r' as mode='rb' would have been), and it should be ignored wherever possible. In particular, encoding='locale' should ignore UTF-8 mode. Then 'L' could and should mean encoding='locale'.

Obviously the names '8' and 'L' are debatable.

'L' could be argued to be unnecessary if there's a simple way to achieve the same thing with the encoding parameter (which currently there isn't).