Mailman 3 PEP 428: stat caching undesirable? - Python-Dev

1 May 2013

      Hi all,

I write as a python lover for over 13 years who's always wanted
something like PEP 428 in Python. 

I am concerned about the caching of stat() results as currently defined
in the PEP. This means that all behaviour built on top of stat(), such
as p.is_dir(), p.is_file(), p.st_size and the like can indefinitely hold
on to stale data until restat() is called, and I consider this
confusing.

Perhaps in recognition of this, p.exists() is implemented differently,
and it does restat() internally (although the PEP does not document
this).

If this behaviour is maintained, then at the very least this makes the
API more complicated to document: some calls cache as a side effect,
others update the cache as a side effect, and others, such as lstat(),
don't cache at all.

This also introduces a divergence of behaviour between os.path.isfile()
and p.is_file(), that is confusing and will also need to be documented.

I'm concerned about scenarios like users of the library polling, for
example, for some file to appear, and being confused about why the
arguably more sloppy poll for p.exists() works while a poll for
p.is_file(), which expresses intent better, never terminates.

In theory the caching mechanism could be further refined to only hold
onto cached results for a limited amount of time, but I would argue this
is unnecessary complexity, and caching should just be removed, along
with restat(). 

Isn't the whole notion that stat() need to be cached for performance
issues somewhat of a historical relic of older OS's and filesystem
performance? AFAIK linux already has stat() caching as a side-effect of
the filesystem layer's metadata caching. How does Windows and Mac OS
fare here? Are there benchmarks proving that this is serious enough to
complicate the API?

If the ability to cache stat() calls is deemed important enough, how
about a different API where is_file(), is_dir() and the like are added
as methods on the result object that stat() returns? Then one can hold
onto a stat() result as a temporary object and ask it multiple questions
without doing another OS call, and is_file() etc. on the Path object can
be documented as being forwarders to the stat() result just as p.st_size
is currently - except that I believe they should forward to a fresh,
uncached stat() call every time.

I write directly to this list instead raising it to Antoine Pitrou in
private just because I don't want to make extra work for him to first
receive my feedback and the re-raise it on this list. If this is wrong
or disrespectful, I apologize.

-- 
Pieter Nagel

PEP 428: stat caching undesirable?

Pieter Nagel

Nick Coghlan

Antoine Pitrou

Pieter Nagel

Charles-François Natali

Nick Coghlan

Guido van Rossum

Christian Heimes

Ben Hoyt

Nick Coghlan

Charles-François Natali

Pieter Nagel

Ben Hoyt

tags

participants (7)