<div dir="ltr">Hi there,<br><br>I was doing some experiments with the buffer interface of bytearray today, for the purpose of quickly reading a file's contents into a bytearray which I can then modify. I decided to do some benchmarking and ran into surprising results. Here are the functions I was timing:<br>
<br>def justread():<br> # Just read a file's contents into a string/bytes object<br> f = open(FILENAME, 'rb')<br> s = f.read()<br><br>def readandcopy():<br> # Read a file's contents and copy them into a bytearray.<br>
# An extra copy is done here.<br> f = open(FILENAME, 'rb')<br> b = bytearray(f.read())<br><br>def readinto():<br> # Read a file's contents directly into a bytearray,<br> # hopefully employing its buffer interface<br>
f = open(FILENAME, 'rb')<br> b = bytearray(os.path.getsize(FILENAME))<br> f.readinto(b)<br><br>FILENAME is the name of a 3.6MB text file. It is read in binary mode, however, for fullest compatibility between 2.x and 3.x<br>
<br>Now, running this under Python 2.7.2 I got these results ($1 just reflects the executable name passed to a bash script I wrote to automate these runs):<br><br>$1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.justread()'<br>
1000 loops, best of 3: 461 usec per loop<br>$1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readandcopy()'<br>100 loops, best of 3: 2.81 msec per loop<br>$1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readinto()'<br>
1000 loops, best of 3: 697 usec per loop<br><br>Which make sense. The readinto() approach is much faster than copying the read buffer into the bytearray.<br><br>But with Python 3.2.2 (built from the 3.2 branch today):<br>
<br>$1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.justread()'<br>1000 loops, best of 3: 336 usec per loop<br>$1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readandcopy()'<br>
100 loops, best of 3: 2.62 msec per loop<br>$1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readinto()'<br>100 loops, best of 3: 2.69 msec per loop<br><br>Oops, readinto takes the same time as copying. This is a real shame, because readinto in conjunction with the buffer interface was supposed to avoid the redundant copy.<br>
<br>Is there a real performance regression here, is this a well-known issue, or am I just missing something obvious?<br><br>Eli<br><br>P.S. The machine is quad-core i7-2820QM, running 64-bit Ubuntu 10.04 <br><br><br><br>
<br>
</div>