[Python-Dev] PEP 460 reboot

Mon Jan 13 10:19:08 CET 2014

On 1/13/2014 12:46 AM, Mark Shannon wrote:
> On 13/01/14 03:47, Guido van Rossum wrote:
>> On Sun, Jan 12, 2014 at 6:24 PM, Ethan Furman <ethan at stoneleaf.us> 
>> wrote:
>>> On 01/12/2014 06:16 PM, Ethan Furman wrote:
>>>>
>>>>
>>>> If you do :
>>>>
>>>> --> b'%s' % 'some text'
>>>
>>>
>>> Ignore what I previously said.  With no encoding the result would be:
>>>
>>> b"'some text'"
>>>
>>> So an encoding should definitely be specified.
>>
>> Yes, but the encoding is no business of %s or %. As far as the
>> formatting operation cares, if the argument is bytes they will be
>> copied literally, and if the argument is a str (or anything else) it
>> will call ascii() on it.
>
> It seems to me that what people want from '%s' is:
> Convert to a str then encode as ascii for non-bytes
> or copy directly for bytes.

Maybe. But it only takes a small tweak to the parameter to get what they 
want... a tweak that works in both Python 2.7 and Python 
3.whatever-version-gets-this.

Instead of

b"%s" % foo

they must use

b"%s"  % foo.encode( explicitEncoding )

which is what they should have been doing in Python 2.7 all along, and 
if they were, they need make no change.

Oh, foo was a Python 2.7 str? Converted to Python 3.x str, by default 
conversion rules? Already in ASCII? No harm.
Oh, foo was a literal? Add b prefix, instead of the .encode("ASCII"), if 
you prefer.

> So why not replace '%s' with '%a' for the ascii case and
> with '%b' for directly inserting bytes.

Because %a and %b don't exist in Python 2.7?

> That way, the encoding is explicit.

The encoding is already explicit.  If it is bytes encoded from str, that 
transformation had an explicit encoding.  If it is "%s" % str(...), then 
there is no encoding, but rather a transformation into an ASCII 
representation of the Unicode code points, using escape sequences. Which 
isn't likely to be what they want, but see the parameter tweak above.

> I think it is vital that the encoding is explicit in all cases where
> bytes <-> str conversion occurs.

Since it is explicit, you have no concerns in this area.

Regarding the concern about implicit use of ASCII by certain bytes 
methods and proposed interpolations, I'm curious how many standard 
encodings exist that do not have an ASCII subset. I can enumerate a 
starting list, but if there are others in actual use, I'm unaware of them.

EBCDIC
UTF-16 BE & LE
UTF-32 BE & LE

Wikipedia: The vast majority of code pages in current use are supersets 
of ASCII <http://en.wikipedia.org/wiki/ASCII>, a 7-bit code representing 
128 control codes and printable characters.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140113/3377e65a/attachment.html>