<div dir="ltr">Hi there,<br><br>I was doing some experiments with the buffer interface of bytearray today, for the purpose of quickly reading a file&#39;s contents into a bytearray which I can then modify. I decided to do some benchmarking and ran into surprising results. Here are the functions I was timing:<br>


<br>def justread():<br>    # Just read a file&#39;s contents into a string/bytes object<br>    f = open(FILENAME, &#39;rb&#39;)<br>    s = f.read()<br><br>def readandcopy():<br>    # Read a file&#39;s contents and copy them into a bytearray.<br>


    # An extra copy is done here.<br>    f = open(FILENAME, &#39;rb&#39;)<br>    b = bytearray(f.read())<br><br>def readinto():<br>    # Read a file&#39;s contents directly into a bytearray,<br>    # hopefully employing its buffer interface<br>


    f = open(FILENAME, &#39;rb&#39;)<br>    b = bytearray(os.path.getsize(FILENAME))<br>    f.readinto(b)<br><br>FILENAME is the name of a 3.6MB text file. It is read in binary mode, however, for fullest compatibility between 2.x and 3.x<br>


<br>Now, running this under Python 2.7.2 I got these results ($1 just reflects the executable name passed to a bash script I wrote to automate these runs):<br><br>$1 -m timeit -s&#39;import fileread_bytearray&#39; &#39;fileread_bytearray.justread()&#39;<br>


1000 loops, best of 3: 461 usec per loop<br>$1 -m timeit -s&#39;import fileread_bytearray&#39; &#39;fileread_bytearray.readandcopy()&#39;<br>100 loops, best of 3: 2.81 msec per loop<br>$1 -m timeit -s&#39;import fileread_bytearray&#39; &#39;fileread_bytearray.readinto()&#39;<br>


1000 loops, best of 3: 697 usec per loop<br><br>Which make sense. The readinto() approach is much faster than copying the read buffer into the bytearray.<br><br>But with Python 3.2.2 (built from the 3.2 branch today):<br>


<br>$1 -m timeit -s&#39;import fileread_bytearray&#39; &#39;fileread_bytearray.justread()&#39;<br>1000 loops, best of 3: 336 usec per loop<br>$1 -m timeit -s&#39;import fileread_bytearray&#39; &#39;fileread_bytearray.readandcopy()&#39;<br>


100 loops, best of 3: 2.62 msec per loop<br>$1 -m timeit -s&#39;import fileread_bytearray&#39; &#39;fileread_bytearray.readinto()&#39;<br>100 loops, best of 3: 2.69 msec per loop<br><br>Oops, readinto takes the same time as copying. This is a real shame, because readinto in conjunction with the buffer interface was supposed to avoid the redundant copy.<br>


<br>Is there a real performance regression here, is this a well-known issue, or am I just missing something obvious?<br><br>Eli<br><br>P.S. The machine is quad-core i7-2820QM, running 64-bit Ubuntu 10.04 <br><br><br><br>

<br>

</div>