[Python-Dev] PEP 460: allowing %d and %f and mojibake

INADA Naoki songofacandy at gmail.com
Sun Jan 12 20:21:59 CET 2014


I want to add one more PoV: small performance regression, especially on
Python 2.
Because programs that needs byte formatting may be low level and used
heavily from application.

Many programs uses one source approach to support Python 3.
And supporting Python 3 should not means large performance regression on
Python 2.


In Python 2:

In [1]: def int_to_bytes(n):
   ...:     return unicode(n).encode('ascii')
   ...:

In [2]: %timeit int_to_bytes(42)
1000000 loops, best of 3: 691 ns per loop

In [3]: %timeit b'Content-Type: ' + int
int           int_to_bytes  intern

In [3]: %timeit b'Content-Type: ' + int_to_bytes(42)
1000000 loops, best of 3: 737 ns per loop

In [4]: %timeit b'Content-Type: %d' % 42
10000000 loops, best of 3: 20.2 ns per loop

In [5]: %timeit (u'Content-Type: %d' % 42).encode('ascii')
1000000 loops, best of 3: 381 ns per loop


In Python 3:

In [1]: def int_to_bytes(n):
   ...:     return str(n).encode('ascii')
   ...:

In [2]: %timeit int_to_bytes(42)
1000000 loops, best of 3: 612 ns per loop

In [3]: %timeit b'Content-Type: ' + int_to_bytes(42)
1000000 loops, best of 3: 668 ns per loop

In [4]: %timeit ('Content-Type: %d' % 42).encode('ascii')
1000000 loops, best of 3: 326 ns per loop


> I'm arguing from three PoVs:
> > 1) 2 & 3 compatible code base
> > 2) having the bytes type /be/ the boundary type
> > 3) readable code
>
> The only one of these that I can see being in any way an argument against
>
> def int_to_bytes(n):
>     return str(n).encode('ascii')
>
> b'Content Length: ' + int_to_bytes(len(binary_data))
>
> is (3), and that's largely subjective. Personally, I see very little
> difference between the above and %d-interpolation in terms of
> *readability*. Brevity, certainly %d wins. But that's not important on
> its own, and I'd argue that my version is more clear in terms of
> describing the intent (and would be even better if I wasn't rubbish at
> thinking of function names, or if this wasn't in isolation, and more
> application-focused functions were used).
>
> > It seems to me the core of Nick's refusal is the (and I agree!)
> rejection of
> > bytes interpolation returning unicode -- but that's not what I'm asking
> for!
> > I'm asking for it to return bytes, with the interpolated data (in the
> case
> > if %d, %s, etc) being strictly-ASCII encoded.
>
> My reading of Nick's refusal is that %d takes a value which is
> semantically a number, converts it into a base-10 representation
> (which is semantically a *string*, not a sequence of bytes[1]) and
> then *encodes* that string into a series of bytes using the ASCII
> encoding. That is *two* semantic transformations, and one (the ASCII
> encoding) is *implicit*. Specifically, it's implicit because (a) the
> normal reading of %d is "produce the base-10 representation of a
> number, and a base-10 representation is a *string*, and (b) because
> nowhere has ASCII been mentioned (why not UTF16? that would be
> entirely plausible for a wchar-based environment like Windows). And a
> core principle of the bytes/text separation in Python 3 is that
> encoding should never happen implicitly.
>
> By the way, I should point out that I would never have understood
> *any* of the ideas involved in this thread before Python 3 forced me
> to think about Unicode and the distinction between text and bytes. And
> yet, I now find myself, in my (non-Python) work environment, being the
> local expert whenever applications screw up text encodings. So I, for
> one, am very grateful for Python 3's clear separation of bytes and
> text. (And if I sometimes come across as over-dogmatic, I apologise -
> put it down to the enthusiasm of the recent convert :-))
>
> Paul
>
> [1] If you cannot see that there's no essential reason why the base-10
> representation '123' should correspond to the bytes b'\x31\x32\x33'
> then you are probably not old enough to have started programming on
> EBCDIC-based computers :-)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com
>



-- 
INADA Naoki  <songofacandy at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140113/81186771/attachment.html>


More information about the Python-Dev mailing list