[Python-ideas] Strings can sometimes convert to bytes without an encoding

Wed Jun 15 04:55:21 EDT 2016

On Jun 15, 2016 1:52 AM, "Greg Ewing" <greg.ewing at canterbury.ac.nz> wrote:
>
> Franklin? Lee wrote:
>>
>> If the string only has code points in range(128), encoding is optional
>> (and useless anyway).
>
>
> No, it's not useless. It's possible to have an encoding
> that takes code points in the range 0-127 to something
> other than their ASCII equivalents. UTF-16, for example.
>
> You're effectively suggesting that ASCII or Latin-1
> should be assumed as a default encoding, which seems like
> a bad idea.

UTF-8 is a default encoding for str.encode and bytes.decode. Latin-1 is the
internal encoding in CPython whenever possible, and PyASCIIObject is an
internal struct in Python 3. It is not exactly alien to Python to choose
ASCII as a default. If it is a bad idea, it is not original to me.

ASCII has a privileged position among single-byte encodings, even in Python
3. There's no 'builtins.latin1', let alone 'builtins.shiftjis' (though,
someone might point out, it's not single-byte). We don't have 're.CA_1'. I
could list more things that Python provides for ASCII but not any
ASCII-incompatible encodings: https://docs.python.org/3/search.html?q=ascii
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160615/9fe0858f/attachment.html>