On Mon, Jan 11, 2016 at 10:57 AM, Gregory P. Smith firstname.lastname@example.org wrote:
On Wed, Jan 6, 2016 at 3:05 PM Brendan Moloney email@example.com wrote:
Its important to keep in mind the main benefit of scandir is you don't have to do ANY stat call in many cases, because the directory listing provides some subset of this info. On Linux you can at least tell if a path is a file or directory. On windows there is much more info provided by the directory listing. Avoiding subsequent stat calls is also nice, but not nearly as important due to OS level caching.
+1 - this was one of the two primary motivations behind scandir. Anything trying to reimplement a filesystem tree walker without using scandir is going to have sub-standard performance.
If we ever offer anything with "find like functionality" related to pathlib, it *needs* to be based on scandir. Anything else would just be repeating the convenient but untrue limiting assumptions of os.listdir: That the contents of a directory can be loaded into memory and that we don't mind re-querying the OS for stat information that it already gave us but we threw away as part of reading the directory.
And we already have this in the form of pathlib's [r]glob() methods. There's a patch to the glob module in http://bugs.python.org/issue25596 and as soon as that's committed I hope that its author(s) will work on doing a similar patch for pathlib's [r]glob (tracking this in http://bugs.python.org/issue26032).