[Python-Dev] PEP 460: allowing %d and %f and mojibake
Ethan Furman
ethan at stoneleaf.us
Sun Jan 12 19:26:56 CET 2014
On 01/12/2014 09:26 AM, Paul Moore wrote:
> On 12 January 2014 17:03, Ethan Furman <ethan at stoneleaf.us> wrote:
>> We know full well the difference between unicode and bytes, and we know full
>> well that numbers and much of the text we need has an ASCII (bytes!)
>> representation. When we do a b'Content Length: %d' % len(binary_data) we
>> are expecting to get back a bytes object, /not/ a unicode object.
>
> What I am struggling to understand here is what room for compromise
> there is. Clearly, for whatever reason,
>
> b'Content Length: ' + str(len(binary_data)).encode('ascii'))
>
> is not acceptable for you. OK, fair enough. Also, apparently, writing a helper
>
> def int_to_bytes(n):
> return str(n).encode('ascii')
>
> b'Content Length: ' + int_to_bytes(len(binary_data))
>
> is unacceptable. But I'm not clear why it's unacceptable. Maybe I
> missed the explanation - God knows, the thread is long enough :-)
True enough! ;) It's unacceptable in the sense that the bytes type is /almost/ there, it's /almost/ what is needed to
handle the boundary conditions. We have a __bytes__ method (how is it supposed to be used?) that could be made to fit
the interpolation bill.
It seems to me the core of Nick's refusal is the (and I agree!) rejection of bytes interpolation returning unicode --
but that's not what I'm asking for! I'm asking for it to return bytes, with the interpolated data (in the case if %d,
%s, etc) being strictly-ASCII encoded.
> On the other hand, Nick has explained why b'Content Length: %d' %
> len(binary_data) is unacceptable to him (you don't have to agree with
> his opinion, just concede that he has explained his position in a way
> that you understand).
Only because he (or Benno) finally wrote some tests and I was able to see what he thought I was wanting. Which does
seem to leave a *tiny* bit of wiggle room if bytes interpolation always return bytes, and never a unicode (yeah, I know,
snowball's chance and all that).
> I'm not trying to argue you're wrong - I don't know your codebase, nor
> do I know your application area. But surely somewhere between "we must
> have % formatting including %d for bytes" and the above, there's a
> middle ground that you *are* willing to accept? Can you give any
> indications of what that might be? What, specifically, about the
> helper function is the problem? I don't think it is any less space
> efficient, it doesn't double-encode, and I don't think it's more
> difficult to understand (although it is a little longer, it trades
> that off against being a bit more explicit as to what's going on).
> Surely you're not arguing that your code must work unchanged (not
> "there's a way of writing the code so it works on Python 2 and 3", but
> "the code you currently have for Python 2 must work with no changes at
> all")?
I'm arguing from three PoVs:
1) 2 & 3 compatible code base
2) having the bytes type /be/ the boundary type
3) readable code
> Can you give an example of code that is *nearly* acceptable to you,
> which works in Python 2 and 3 today, and explain what improvements you
> would like to see to it in order to use it instead of waiting for a
> core change?
I'm not trying to be difficult (just naturally good at it, I guess ;) , but I don't see a lot room for compromises -- I
would like % interpolation, I'm told I have to use a helper function. I will if I have to, but first I have to try and
make myself understood, and I'm not sure that has happened yet. Following Nick's example I'm writing up some tests that
clearly show what I would like to see. Then at least we can debate what I'm actually asking for, and now what the
(understandably) unicode-what-a-mess-we-had-in-py2k-don't-want-again that some think I am asking for.
--
~Ethan~
More information about the Python-Dev
mailing list