[Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

Ben Hoyt benhoyt at gmail.com
Mon Jul 21 18:48:50 CEST 2014


Thanks for an initial look into this, Victor.

> IMO the current os.scandir() API does not fit importlib requirements.
> importlib usually wants fresh data, whereas DirEntry cache cannot be
> invalidated. It's probably possible to cache some os.stat() result in
> importlib, but it looks like it requires a non trivial refactoring of
> the code. I don't know importlib enough to suggest how to change it.

Yes, with importlib already doing its own caching (somewhat
complicated, as the open and closed issues show), I get the feeling it
wouldn't be a good fit. Note that I'm not saying we wouldn't use it if
we were implementing importlib from scratch.

> By the way, DirEntry constructor is not documented in the PEP. Should
> we document it? It might be a way to "invalidate the cache":

I would prefer not to, just to keep things simple. Similar to creating
os.stat_result() objects ... you can kind of do it (see scandir.py),
but it's not recommended or even documented. The entire purpose of
DirEntry objects is so scandir can produce them, not for general use.

> entry = DirEntry(os.path.dirname(entry.path), entry.name)
>
> Maybe it is an abuse of the API. A clear_cache() method would be less
> ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry
> for a long time?
>
> Another question: should we expose DirEntry type directly in the os
> namespace? (os.DirEntry)

Again, I'd rather not expose this. It's quite system-specific (see the
different system versions in scandir.py), and trying to combine this,
make it consistent, and document it would be a bit of a pain, and also
possibly prevent future modifications (because then the parts of the
implementation would be set in stone).

I'm not really opposed to a clear_cache() method -- basically it'd set
_lstat and _stat and _d_type to None internally. However, I'd prefer
to keep it as is, and as the PEP says:

If developers want "refresh" behaviour (for example, for watching a
file's size change), they can simply use pathlib.Path objects, or call
the regular os.stat() or os.path.getsize() functions which get fresh
data from the operating system every call.

-Ben


More information about the Python-Dev mailing list