[Python-Dev] PEP 460 reboot

Mon Jan 13 19:45:37 CET 2014

On Mon, Jan 13, 2014 at 12:42 PM, R. David Murray <rdmurray at bitdance.com> wrote:
> On Mon, 13 Jan 2014 12:41:18 +0100, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> On Sun, 12 Jan 2014 18:11:47 -0800
>> Guido van Rossum <guido at python.org> wrote:
>> > On Sun, Jan 12, 2014 at 5:27 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> > > On 01/12/2014 04:47 PM, Guido van Rossum wrote:
>> > >> %s seems the trickiest: I think with a bytes argument it should just
>> > >> insert those bytes (and the padding modifiers should work too), and
>> > >> for other types it should probably work like %a, so that it works as
>> > >> expected for numeric values, and with a string argument it will return
>> > >> the ascii()-variant of its repr(). Examples:
>> > >>
>> > >> b'%s' % 42 == b'42'
>> > >> b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x'
>> > >> enclosed in single quotes)
>> > >
>> > > I'm not sure about the quotes.  Would anyone ever actually want those in the
>> > > byte stream?
>> >
>> > Perhaps not, but it's a hint that you should probably think about an
>> > encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of
>> > it as payback time. :-)
>>
>> What is the use case for embedding a quoted ASCII-encoded representation
>> in a byte stream?
>
> There is no use case in the sense you are asking, just like there is no
> real use case for '%s' % b'x' producing "b'x'".  But the real use case
> is exactly the same: to let you know your code is screwed up without
> actually blowing up with a encoding Exception.
>
> For the record, I like Guido's logic and proposal.  I don't understand
> Nick's objection, since I don't see the difference between the situation
> here where a string gets interpolated into bytes as 'xxx' and the
> corresponding situation where bytes gets interpolated into a string
> as b'xxx'.  Why struggle to keep bytes interpolation "pure" if string
> interpolation isn't?
>
> Guido's proposal makes the language more symmetric, and thus more
> consistent and less surprising.  Exactly the hallmarks of Python's design
> sense, IMO.  (Big surprise, right? :)
>
> Of course, this point of view *is* based on the idea that when you are
> doing interpolation using %/.format, you are in fact primarily concerned
> with ASCII compatible byte streams.  This is a Practicality sort of
> argument.  It is, after all, by far the most common use case when
> doing interpolation[*].
>
> If you wanted to do a purist version of this symmetry, you'd have bytes(x)
> calling __bytes__ if it was defined and falling back to calling a
> __brepr__ otherwise.
>
> But what would __brepr__ implement?  The variety of format codes in
> the struct module argues that there is no "one obvious" binary
> repr for most types.  (Those that have one would implement __bytes__).
> And what would be the __brepr__ of an arbitrary 'object'?
>
> Faced with the impracticality of defining __brepr__ usefully in any "pure
> bytes" form, it seems sensible to admit that the most useful __brepr__
> is the ascii() encoding of the __repr__.  Which naturally produces 'xxx'
> as the __brepr__ of a string.
>
> This does cause things to get a little un-pretty when you are operating
> at the python prompt:
>
>     >>> b'%s' % object
>     b'"<class \\\'object\\\'>"'
>
> But then again that is most likely really not what you mean to do, so
> it becomes a big red flag...just like b'xxx' is a small red flag when
> you accidentally interpolate unencoded bytes into a string.
>
> --David
>
> PS: When I first read Guido's remark that the result of interpolating a
> string should be 'xxx', I went Wah?  I had to reason my way through to
> it as above, but to him it was just the natural answer.  Guido isn't
> always right, but this kind of automatic language design consistency
> is one reason he's the BDFL.
>
> [*] I still think that you mostly want to design your library so that
> you are handling the text parts as text and the bytes parts as bytes,
> and encoding/gluing them as appropriate at the IO boundary.  But if Guido
> says his real code would benefit by being able to interpolate ASCII into
> bytes at certain points, I'll believe him.

<elided rant/>

If you think corrupted data is easier or more pleasant to track down
than encoding exceptions then I think you are strange. It makes
porting really difficult while you are still trying to figure out
where the bytes/str boundaries are. I am now deeply suspicious of all
% formatting.