On Mon, Sep 27, 2010 at 5:41 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
While trying to solve #3873 (poor performance of pickle on file
objects, due to the overhead of calling read() with very small values),

After looking over the relevant code, it looks to me like the overhead of calling the read() method compared to calling fread() in Python 2 is the overhead of calling PyObject_Call along with the construction of argument tuples and deconstruction of the return value.  I don't think the extra interface would benefit code written in Python as much.  Even if  Python code gets the data into a buffer more easily, it's going to pay those costs to manipulate the buffered data.  It would mostly help modules written in C, such as pickle, which right now are heavily bottlenecked getting the data into a buffer.

Comparing the C code for Python 2's cPickle and Python 3's pickle, I see that Python 2 has paths for unpickling from a FILE *, cStringIO, and "other".  Python effectively only has a code path for "other", so it's not surprising that it's slower.  In the worst case, I am sure that if we re-added specialized code paths that we could make it just as fast as Python 2, although that would make the code messy.

Some ideas:
- Use readinto() instead of read(), to avoid extra allocations/deallocations
- But first, fix bufferediobase_readinto() so it doesn't work by calling the read() method and/or follow up on the TODO in buffered_readinto()

If you want a new API, I think a new C API for I/O objects with C-friendly arguments would be better than a new Python-level API.

In a nutshell, if you feel the need to make a buffer around BufferedReader, then I agree there's a problem, but I don't think helping you make a buffer around BufferedReader is the right solution. ;-)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC