[Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

Eli Bendersky eliben at gmail.com
Fri Nov 25 07:41:47 CET 2011


>
> Eli, the use pattern I was referring to is when you read in chunks,
> and and append to a running buffer. Presumably if you know in advance
> the size of the data, you can readinto directly to a region of a
> bytearray. There by avoiding having to allocate a temporary buffer for
> the read, and creating a new buffer containing the running buffer,
> plus the new.
>
> Strangely, I find that your readandcopy is faster at this, but not by
> much, than readinto. Here's the code, it's a bit explicit, but then so
> was the original:
>
> BUFSIZE = 0x10000
>
> def justread():
>    # Just read a file's contents into a string/bytes object
>    f = open(FILENAME, 'rb')
>     s = b''
>    while True:
>        b = f.read(BUFSIZE)
>        if not b:
>            break
>        s += b
>
> def readandcopy():
>    # Read a file's contents and copy them into a bytearray.
>    # An extra copy is done here.
>    f = open(FILENAME, 'rb')
>     s = bytearray()
>    while True:
>        b = f.read(BUFSIZE)
>        if not b:
>            break
>        s += b
>
> def readinto():
>    # Read a file's contents directly into a bytearray,
>    # hopefully employing its buffer interface
>    f = open(FILENAME, 'rb')
>     s = bytearray(os.path.getsize(FILENAME))
>    o = 0
>    while True:
>        b = f.readinto(memoryview(s)[o:o+BUFSIZE])
>        if not b:
>            break
>        o += b
>
> And the timings:
>
> $ python3 -O -m timeit 'import fileread_bytearray'
> 'fileread_bytearray.justread()'
> 10 loops, best of 3: 298 msec per loop
> $ python3 -O -m timeit 'import fileread_bytearray'
> 'fileread_bytearray.readandcopy()'
> 100 loops, best of 3: 9.22 msec per loop
> $ python3 -O -m timeit 'import fileread_bytearray'
> 'fileread_bytearray.readinto()'
> 100 loops, best of 3: 9.31 msec per loop
>
> The file was 10MB. I expected readinto to perform much better than
> readandcopy. I expected readandcopy to perform slightly better than
> justread. This clearly isn't the case.
>
>
What is 'python3' on your machine? If it's 3.2, then this is consistent
with my results. Try it with 3.3 and for a larger file (say ~100MB and up),
you may see the same speed as on 2.7

Also, why do you think chunked reads are better here than slurping the
whole file into the bytearray in one go? If you need it wholly in memory
anyway, why not just issue a single read?

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111125/3645e2b7/attachment.html>


More information about the Python-Dev mailing list