Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

28 Jun 2014

      On Fri, Jun 27, 2014 at 2:58 PM, Nick Coghlan  wrote:
...
* -1 on including Windows specific globbing support in the API
* -0 on including cross platform globbing support in the initial iteration
of the API (that could be done later as a separate RFE instead)
Agreed.  Globbing or filtering support should not hold this up.  If that
part isn't settled, just don't include it and work out what it should be as
a future enhancement.
...
* +1 on a new section in the PEP covering rejected design options (calling
it iterdir, returning a 2-tuple instead of a dedicated DirEntry type)
+1.  IMNSHO, one of the most important part of PEPs: capturing the entire
decision process to document the "why nots".
...
* regarding "why not a 2-tuple", we know from experience that operating
systems evolve and we end up wanting to add additional info to this kind of
API. A dedicated DirEntry type lets us adjust the information returned over
time, without breaking backwards compatibility and without resorting to
ugly hacks like those in some of the time and stat APIs (or even our own
codec info APIs)
* it would be nice to see some relative performance numbers for NFS and
CIFS network shares - the additional network round trips can make excessive
stat calls absolutely brutal from a speed perspective when using a network
drive (that's why the stat caching added to the import system in 3.3
dramatically sped up the case of having network drives on sys.path, and why
I thought AJ had a point when he was complaining about the fact we didn't
expose the dirent data from os.listdir)
fwiw, I wouldn't wait for benchmark numbers.

A needless stat call when you've got the information from an earlier API
call is already brutal. It is easy to compute from existing ballparks
remote file server / cloud access: ~100ms, local spinning disk seek+read:
~10ms. fetch of stat info cached in memory on file server on the local
network: ~500us.  You can go down further to local system call overhead
which can vary wildly but should likely be assumed to be at least 10us.

You don't need a benchmark to tell you that adding needless >= 500us-100ms
blocking operations to your program is bad. :)

-gps

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Gregory P. Smith