[Python-Dev] PEP 428: stat caching undesirable?

Antoine Pitrou solipsis at pitrou.net
Wed May 1 12:18:21 CEST 2013


On Wed, 01 May 2013 09:32:28 +0200
Pieter Nagel <pieter at nagel.co.za> wrote:
> Hi all,
> I write as a python lover for over 13 years who's always wanted
> something like PEP 428 in Python. 
> I am concerned about the caching of stat() results as currently defined
> in the PEP. This means that all behaviour built on top of stat(), such
> as p.is_dir(), p.is_file(), p.st_size and the like can indefinitely hold
> on to stale data until restat() is called, and I consider this
> confusing.

I understand it might be confusing. On the other hand, if you call
is_dir() then is_file() (then perhaps another metadata-reading
operation), and there's a race condition where the file is modified
in-between, you could have pretty much nonsensical results, if not for
the caching.

> Isn't the whole notion that stat() need to be cached for performance
> issues somewhat of a historical relic of older OS's and filesystem
> performance? AFAIK linux already has stat() caching as a side-effect of
> the filesystem layer's metadata caching. How does Windows and Mac OS
> fare here? Are there benchmarks proving that this is serious enough to
> complicate the API?

Surprisingly enough, some network filesystems have rather bad stat()
performance. This has been reported for years as an issue with Python's
import machinery, until 3.3 added a caching scheme where stat() calls
are no more issued for each and every path directory and each and every
imported module.

But as written above caching is also a matter of functionality. I'll
let other people chime in.

> If the ability to cache stat() calls is deemed important enough, how
> about a different API where is_file(), is_dir() and the like are added
> as methods on the result object that stat() returns?

That's a good idea too. It isn't straightforward since os.stat() is
implemented in C.



More information about the Python-Dev mailing list