[Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

Fri Nov 25 07:13:45 CET 2011

On Fri, Nov 25, 2011 at 12:07 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Fri, 25 Nov 2011 12:02:17 +1100
> Matt Joiner <anacrolix at gmail.com> wrote:
>> It's my impression that the readinto method does not fully support the
>> buffer interface I was expecting. I've never had cause to use it until
>> now. I've created a question on SO that describes my confusion:
>>
>> http://stackoverflow.com/q/8263899/149482
>
> Just use a memoryview and slice it:
>
> b = bytearray(...)
> m = memoryview(b)
> n = f.readinto(m[some_offset:])

Cheers, this seems to be what I wanted. Unfortunately it doesn't
perform noticeably better if I do this.

Eli, the use pattern I was referring to is when you read in chunks,
and and append to a running buffer. Presumably if you know in advance
the size of the data, you can readinto directly to a region of a
bytearray. There by avoiding having to allocate a temporary buffer for
the read, and creating a new buffer containing the running buffer,
plus the new.

Strangely, I find that your readandcopy is faster at this, but not by
much, than readinto. Here's the code, it's a bit explicit, but then so
was the original:

BUFSIZE = 0x10000

def justread():
    # Just read a file's contents into a string/bytes object
    f = open(FILENAME, 'rb')
    s = b''
    while True:
        b = f.read(BUFSIZE)
        if not b:
            break
        s += b

def readandcopy():
    # Read a file's contents and copy them into a bytearray.
    # An extra copy is done here.
    f = open(FILENAME, 'rb')
    s = bytearray()
    while True:
        b = f.read(BUFSIZE)
        if not b:
            break
        s += b

def readinto():
    # Read a file's contents directly into a bytearray,
    # hopefully employing its buffer interface
    f = open(FILENAME, 'rb')
    s = bytearray(os.path.getsize(FILENAME))
    o = 0
    while True:
        b = f.readinto(memoryview(s)[o:o+BUFSIZE])
        if not b:
            break
        o += b

And the timings:

$ python3 -O -m timeit 'import fileread_bytearray'
'fileread_bytearray.justread()'
10 loops, best of 3: 298 msec per loop
$ python3 -O -m timeit 'import fileread_bytearray'
'fileread_bytearray.readandcopy()'
100 loops, best of 3: 9.22 msec per loop
$ python3 -O -m timeit 'import fileread_bytearray'
'fileread_bytearray.readinto()'
100 loops, best of 3: 9.31 msec per loop

The file was 10MB. I expected readinto to perform much better than
readandcopy. I expected readandcopy to perform slightly better than
justread. This clearly isn't the case.

>
>> Also I saw some comments on "top-posting" am I guilty of this?

If tehre's a magical option in gmail someone knows about, please tell.

>
> Kind of :)
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com
>