
On Tue, Jan 5, 2016 at 12:27 PM, Brendan Moloney <moloney@ohsu.edu> wrote:
The main issue is the lack of stat caching. That is why I wrote my own module around scandir which includes the DirEntry objects for each path so that the consumer can also do stuff with the cached stat info (like check if it is a file or directory). Often we won't need to call stat on the path at all, and if we do it will only be once.
I wonder if stat() caching shouldn't be made an orthogonal optional feature of Path objects somehow; it keeps coming back as useful in various cases even though we don't want to enable it by default. One problem with stat() caching is that Path objects are considered immutable, and two Path objects referring to the same path are completely interchangeable. For example, {pathlib.Path('/a'), pathlib.Path('/a')} is a set of length 1: {PosixPath('/a')}. But if we had e.g. Path('/a', cache_stat=True), the behavior of two instances of that object might be observably different (if they were instantiated at times when the contents of the filesystem was different). So maybe stat-caching Path instances should be considered unequal, or perhaps unhashable. Or perhaps they should only be considered equal if their stat() values are actually equal (i.e. if the file's stat() info didn't change). . So this is a thorny issue that requires some real thought before we commit to an API. We might also want to create Path instances directly from DirEntry objects. (Interesting, the DirEntry API seems to be a subset of the Path API, except for the .path attribute which is equivalent to the str() of a Path object.) Maybe some of this can be done first as a 3rd party module forked from the original 3rd party pathlib? https://bitbucket.org/pitrou/pathlib/ seems reasonably up to date. -- --Guido van Rossum (python.org/~guido)