On 2020-07-02 14:57, Victor Stinner wrote:
Le jeu. 2 juil. 2020 à 14:44, Barry Scott <barry@barrys-emacs.org> a écrit :
It's not obvious to me why the latin1 encoding is in this list as its just one of all the 8-bit char sets. Why is it needed?
The Latin-1 (ISO 8859-1) charset is kind of special: it maps bytes 0x00-0xFF to Unicode characters U+0000-U+00FF and decoding from latin1 cannot fail.
This apparently makes it useful for not-quite-text, not-quite-bytes protocols like HTTP. In particular, WSGI (PEP 3333) uses latin-1 for headers.
It was commonly used as the locale encoding in Europe 10 years ago, but nowadays most Linux distributions use UTF-8 as the locale encoding.
I'm also fine with restricting the list to 3 encodings: ASCII, UTF-8 and Windows ANSI code page.