<p dir="ltr"><br>

On 25 Nov 2013 09:07, "Ben Hoyt" <<a href="mailto:benhoyt@gmail.com">benhoyt@gmail.com</a>> wrote:<br>

><br>

> > Right now, pathlib doesn't cache. Guido decided it was safer to start<br>

> > off like that, and perhaps later we can add some optional caching.<br>

> ><br>

> > One reason caching didn't go in is that it's not clear which API is<br>

> > best. Working on pluggin scandir() into pathlib would actually help<br>

> > choosing a stat-caching API.<br>

> ><br>

> > (or, rather, lstat-caching...)<br>

> ><br>

> >> The other related thing is that DirEntry only provides .lstat(),<br>

> >> because it's providing stat-like info without following links.<br>

> ><br>

> > Path.is_dir() and friends use stat(), i.e. they inform you about<br>

> > whether a symlink's target is a directory (not the symlink itself).  Of<br>

> > course, if the DirEntry says the path is a symlink, Path.is_dir() could<br>

> > then run stat() to find out about the target.<br>

> ><br>

> > Do you plan to propose scandir() for inclusion in the stdlib?<br>

><br>

> Yes, I was hoping to propose adding "os.scandir() -> yields DirEntry<br>

> objects" for inclusion into the stdlib, and also speed up os.walk() as<br>

> a result.<br>

><br>

> However, pathlib's API with .is_dir() and .lstat() etc are so close to<br>

> DirEntry, I'd be much keener to roll up the scandir functionality into<br>

> pathlib's iterdir(), as that's already going in the standard library,<br>

> and iterdir() already returns Path objects.<br>

><br>

> I'm just not sure it's possible or useful without stat caching.<br>

><br>

> We could do Path.lstat(cached=True), but we'd also really want<br>

> is_dir(cached=True), so that API kinda sucks. Alternatively you could<br>

> have iterdir(cached=True) return PathWithCachedStat style objects --<br>

> probably better, but kinda messy.<br>

><br>

> For these reasons, I would much prefer stat caching on by default in<br>

> Path -- in my experience, the cached behaviour is desired much much<br>

> more often than the non-cached. I've written directory walkers more<br>

> often than I can count, whereas I've maybe only once written a<br>

> long-running process that needs to re-stat, and if it's clearly<br>

> documented as cached, then it's super easy to call restat(), or create<br>

> a new Path instance to get new stat info.<br>

><br>

> This would allow iterdir() to take advantage of the huge performance<br>

> improvements you can get when walking directories.<br>

><br>

> Guido, are you at all open to reconsidering the uncached-by-default in<br>

> light of this?</p>

<p dir="ltr">No, caching on the object is dangerously unintuitive - it means two Path objects can compare equal, but give different answers for stat-dependent queries.</p>

<p dir="ltr">A global string (or Path) keyed cache (rather than a per-object cache) would actually be a safer option, since it would ensure distinct path objects always gave the same answer. That's the approach I will likely pursue at some point in walkdir.</p>


<p dir="ltr">It's also quite likely the "rich stat object" API will be pursued for 3.5, which is a much safer approach to stat result caching than trying to embed it directly in pathlib.Path objects.</p>

<p dir="ltr">That's why we decided to punt on the caching question until 3.5 - it's better to provide a predictable building block that doesn't provide caching, and then work out how to provide a sensible caching layer on top of that, rather than trying to rush a potentially flawed caching design that leads to inconsistent behaviour.</p>


<p dir="ltr">Cheers,<br>

Nick.</p>

<p dir="ltr">><br>

> -Ben<br>

> _______________________________________________<br>

> Python-Dev mailing list<br>

> <a href="mailto:Python-Dev@python.org">Python-Dev@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/python-dev">https://mail.python.org/mailman/listinfo/python-dev</a><br>

> Unsubscribe: <a href="https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com">https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com</a><br>

</p>