[Python-ideas] [Python-Dev] Prefetching on buffered IO files
daniel at stutzbachenterprises.com
Tue Sep 28 16:26:30 CEST 2010
On Mon, Sep 27, 2010 at 5:41 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> While trying to solve #3873 (poor performance of pickle on file
> objects, due to the overhead of calling read() with very small values),
After looking over the relevant code, it looks to me like the overhead of
calling the read() method compared to calling fread() in Python 2 is the
overhead of calling PyObject_Call along with the construction of argument
tuples and deconstruction of the return value. I don't think the extra
interface would benefit code written in Python as much. Even if Python
code gets the data into a buffer more easily, it's going to pay those costs
to manipulate the buffered data. It would mostly help modules written in C,
such as pickle, which right now are heavily bottlenecked getting the data
into a buffer.
Comparing the C code for Python 2's cPickle and Python 3's pickle, I see
that Python 2 has paths for unpickling from a FILE *, cStringIO, and
"other". Python effectively only has a code path for "other", so it's not
surprising that it's slower. In the worst case, I am sure that if we
re-added specialized code paths that we could make it just as fast as Python
2, although that would make the code messy.
- Use readinto() instead of read(), to avoid extra allocations/deallocations
- But first, fix bufferediobase_readinto() so it doesn't work by calling the
read() method and/or follow up on the TODO in buffered_readinto()
If you want a new API, I think a new C API for I/O objects with C-friendly
arguments would be better than a new Python-level API.
In a nutshell, if you feel the need to make a buffer around BufferedReader,
then I agree there's a problem, but I don't think helping you make a buffer
around BufferedReader is the right solution. ;-)
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-ideas