[issue3982] support .format for bytes

Terry J. Reedy report at bugs.python.org
Wed Jan 23 00:34:32 CET 2013


Terry J. Reedy added the comment:

>it would probably be reasonable to make these protocols use str objects at the heart, and only convert to bytes after the formatting is done.

I presume this would mean adding 'if py3: out = out.encode()' after the formatting. As I said before, this works much better in 3.3+ than in 3.2-. Some actual numbers:

for len in (0, 100, 1000, 10000, 100000):
    a = 'a' * len
    print(timeit("a.encode()", "from __main__ import a"))
>>> 
0.19305401378265558
0.22193721412302575
0.2783227054755883
0.677596406192696
7.124387897799184

Given n = 1000000, these should be microseconds per encoding. Of note: 
the copying of bytes does not double the total time until there are a few thousand chars. Would protocols be using .format for much more than this?

[If speed is really an issue, we could make binary file/socket write methods unicode implementation aware. They could directly access the ascii (or latin-1) bytes in a unicode object, just as they do with a bytes object, and the extra copy could be skipped.]

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3982>
_______________________________________


More information about the Python-bugs-list mailing list