Mailman 3 io.BytesIO slower than monkey-patching io.RawIOBase - Python-Dev

17 Jul 2012

      While working on #1767933, Serhiy came up with an observation that
"monkey-patching" one of the base classes of io is faster than using
BytesIO when in need of a file-like object for writing into.

I've distilled it into this standalone test:

import io

data = [b'a'*10, b'bb'*5, b'ccc'*5] * 10000

def withbytesio():
    bio = io.BytesIO()
    for i in data:
        bio.write(i)
    return bio.getvalue()

def monkeypatching():
    mydata = []
    file = io.RawIOBase()
    file.writable = lambda: True
    file.write = mydata.append

    for i in data:
        file.write(i)
    return b''.join(mydata)

The second approach is consistently 10-20% faster than the first one
(depending on input) for trunk Python 3.3

Is there any reason for this to be so? What does BytesIO give us that the
second approach does not (I tried adding more methods to the patched
RawIOBase to make it more functional, like seekable() and tell(), and it
doesn't affect performance)?

This also raises a "moral" question - should I be using the second approach
deep inside the stdlib (ET.tostring) just because it's faster?

Eli

io.BytesIO slower than monkey-patching io.RawIOBase

Eli Bendersky

John O'Connor

Nick Coghlan

Antoine Pitrou

Serhiy Storchaka

Eli Bendersky

tags

participants (5)