[Python-ideas] duck typing for io write methods

Wolfgang Maier wolfgang.maier at biologie.uni-freiburg.de
Sat Jun 15 16:45:43 CEST 2013


Terry Reedy <tjreedy at ...> writes:

> 
> On 6/14/2013 5:00 AM, Wolfgang Maier wrote:
> 
> > this is what str(int).encode() does, but is quite complicated, since it
> > actually generates a full-blown Python string object first,  then encodes
> > this to bytes again.
> 
> In 3.3+, it is not a complicated as you seem to think since the string 
> of ascii digit chars uses one byte per char and the 'encoding' is just a 
> copy. On my machine, with i = 123456, the two calls take about .3 and .2 
> microseconds. The extra call is noise compared to time to read, split 
> into 4 bytes, convert 2 bytes to ints, subtract, and after the 
> conversion of the difference to bytes, join and write the line.
> 
> from timeit import repeat
> 
> def f():
>    b = b'somelinedescriptor\t100\t500\tmorestuffhere\n'
>    b = b.split(b'\t')
>    i = int(b[2]) - int(b[1])
>    b'\t'.join((b[0], str(i).encode(), b[3]))
> 
> print(repeat('f()', 'from __main__ import f'))
> 
>  >>>
> [2.584412482335259, 2.614494724632941, 2.6133167166162155]
> + read/write time
> 

This sounds pretty good! I have to say I haven't timed it yet (was going to
do so after the weekend). As I was saying, I simply felt uncomfortable with
the double-conversion.
Two questions though: you're saying in 3.3+. Does that mean the behaviour
has changed with 3.3 or that you checked it only for that version (I'm
currently using 3.2)?
Second, is that one byte optimization special for str() from int or is it
happening elsewhere too (like in string literals without non-english
characters)? Where can I find that documented?
Oh, and thanks for this really constructive post.
Best,
Wolfgang




More information about the Python-ideas mailing list