[Python-Dev] PEP 460: allowing %d and %f and mojibake

Ethan Furman ethan at stoneleaf.us
Sun Jan 12 19:26:56 CET 2014

On 01/12/2014 09:26 AM, Paul Moore wrote:
> On 12 January 2014 17:03, Ethan Furman <ethan at stoneleaf.us> wrote:
>> We know full well the difference between unicode and bytes, and we know full
>> well that numbers and much of the text we need has an ASCII (bytes!)
>> representation.  When we do a b'Content Length: %d' % len(binary_data) we
>> are expecting to get back a bytes object, /not/ a unicode object.
> What I am struggling to understand here is what room for compromise
> there is. Clearly, for whatever reason,
> b'Content Length: ' + str(len(binary_data)).encode('ascii'))
> is not acceptable for you. OK, fair enough. Also, apparently, writing a helper
> def int_to_bytes(n):
>      return str(n).encode('ascii')
> b'Content Length: ' + int_to_bytes(len(binary_data))
> is unacceptable. But I'm not clear why it's unacceptable. Maybe I
> missed the explanation - God knows, the thread is long enough :-)

True enough!  ;)  It's unacceptable in the sense that the bytes type is /almost/ there, it's /almost/ what is needed to 
handle the boundary conditions.  We have a __bytes__ method (how is it supposed to be used?) that could be made to fit 
the interpolation bill.

It seems to me the core of Nick's refusal is the (and I agree!) rejection of bytes interpolation returning unicode -- 
but that's not what I'm asking for!  I'm asking for it to return bytes, with the interpolated data (in the case if %d, 
%s, etc) being strictly-ASCII encoded.

> On the other hand, Nick has explained why b'Content Length: %d' %
> len(binary_data) is unacceptable to him (you don't have to agree with
> his opinion, just concede that he has explained his position in a way
> that you understand).

Only because he (or Benno) finally wrote some tests and I was able to see what he thought I was wanting.  Which does 
seem to leave a *tiny* bit of wiggle room if bytes interpolation always return bytes, and never a unicode (yeah, I know, 
snowball's chance and all that).

> I'm not trying to argue you're wrong - I don't know your codebase, nor
> do I know your application area. But surely somewhere between "we must
> have % formatting including %d for bytes" and the above, there's a
> middle ground that you *are* willing to accept? Can you give any
> indications of what that might be? What, specifically, about the
> helper function is the problem? I don't think it is any less space
> efficient, it doesn't double-encode, and I don't think it's more
> difficult to understand (although it is a little longer, it trades
> that off against being a bit more explicit as to what's going on).
> Surely you're not arguing that your code must work unchanged (not
> "there's a way of writing the code so it works on Python 2 and 3", but
> "the code you currently have for Python 2 must work with no changes at
> all")?

I'm arguing from three PoVs:

1) 2 & 3 compatible code base

2) having the bytes type /be/ the boundary type

3) readable code

> Can you give an example of code that is *nearly* acceptable to you,
> which works in Python 2 and 3 today, and explain what improvements you
> would like to see to it in order to use it instead of waiting for a
> core change?

I'm not trying to be difficult (just naturally good at it, I guess ;) , but I don't see a lot room for compromises -- I 
would like % interpolation, I'm told I have to use a helper function.  I will if I have to, but first I have to try and 
make myself understood, and I'm not sure that has happened yet.  Following Nick's example I'm writing up some tests that 
clearly show what I would like to see.  Then at least we can debate what I'm actually asking for, and now what the 
(understandably) unicode-what-a-mess-we-had-in-py2k-don't-want-again that some think I am asking for.


More information about the Python-Dev mailing list