[Python-ideas] Strings can sometimes convert to bytes without an encoding

Ethan Furman ethan at stoneleaf.us
Wed Jun 15 11:27:04 EDT 2016


On 06/14/2016 04:46 PM, Franklin? Lee wrote:
> On Tue, Jun 14, 2016 at 7:26 PM, Guido van Rossum wrote:

>> -1. Such a check for the contents of the string sounds exactly like the
>> Python 2 behavior we are trying to get away [from].
>
> But isn't it really just converting back and forth between two
> representations of the same thing? A str with char width 1 is
> conceptually an ASCII string; you're just changing how it's exposed to
> the program.

The main reason Python 3 is not Python 2 is because text is text and 
bytes are bytes and there will be no more automagic encoding/decoding 
betwixt the two.


On 06/15/2016 01:55 AM, Franklin? Lee wrote:
 > UTF-8 is a default encoding for str.encode and bytes.decode. Latin-1
 > is the internal encoding in CPython whenever possible, and
 > PyASCIIObject is an internal struct in Python 3. It is not exactly
 > alien to Python to choose ASCII as a default. If it is a bad idea, it
 > is not original to me.

- cPython is not the only Python

- Latin-1 is an implementation detail, not a language guarantee

- PyASCIIObject is (probably) a name left over from Python 2 (massive
   renames of various structures is usually needless code churn)

- it may not have been a bad idea when Python was created, but it is a
   bad idea now



Please put your energy elsewhere because this particular is not going to 
change.

--
~Ethan~


More information about the Python-ideas mailing list