On Tue, 28 Sep 2010 09:44:38 -0700 Guido van Rossum email@example.com wrote:
But AFAICT unpickle doesn't use seek()?
But, if the stream had prefetch(), the unpickling would be simplified: I would only have to call prefetch() once when refilling the buffer, rather than two read()'s followed by a peek().
(I could try to coalesce the two reads, but it would complicate the code a bit more...)
Where exactly would the peek be used? (I must be confused because I can't find either peek or seek in _pickle.c.)
peek/seek are not used currently (in SVN). Each of them is used in one of the prefetching approaches proposed to solve the unpickling performance problem.
(the first approach uses seek() and read(), the second approach uses read() and peek(); as already explained, I tend to consider the second approach much better, and the prefetch() proposal comes in part from the experience gathered on that approach)
It still seems to me that the "right" way to solve this would be to insert a transparent extra buffer somewhere, probably in the GzipFile code, and work in reducing the call overhead.
No, because if you don't have any buffering on the unpickling side (rather than the GzipFile or the BufferedReader side), then you still have the method call overhead no matter what. And this overhead is rather big when you're reading data byte per byte, or word per word (which unpickling very frequently does).
(for the record, GzipFile already has an internal buffer. But calling GzipFile.read() still has a large overhead compared to reading data directly from a prefetch buffer inside the unpickler object)