[Python-Dev] Prefetching on buffered IO files

Antoine Pitrou solipsis at pitrou.net
Wed Sep 29 10:55:56 CEST 2010


On Wed, 29 Sep 2010 10:06:57 +0200
Hagen Fürstenau <hagen at zhuliguan.net> wrote:
> > Ow... I've always assumed that seek() is essentially free, because
> > that's how a typical OS kernel implements it. If seek() is bad on
> > GzipFile, how hard would it be to fix this?
> 
> I'd imagine that there's no easy way to make arbitrary seeks on a
> GzipFile fast. But wouldn't it be enough to optimize small relative
> (backwards) seeks?

As I explained to Guido, GzipFile doesn't know the buffering size of
its consumer (apart from introducing couplings), and therefore
has no way to know how much information it must retain.

To reiterate, there's a complicated solution (optimize an
implementation-dependent behaviour of GzipFile, with a non-trivial
coding effort and performance tradeoff) which will not work on
unseekable files anyway. And there's a more generic solution involving
non-seeking primitives such as read() + peek().

(follow-up to python-ideas, if I didn't mess up the headers)

Regards

Antoine.




More information about the Python-Dev mailing list