Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

June 28, 2014

      On 28 June 2014 16:17, Gregory P. Smith <greg@krypto.org> wrote:
...
On Fri, Jun 27, 2014 at 2:58 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
...
* it would be nice to see some relative performance numbers for NFS and
CIFS network shares - the additional network round trips can make excessive
stat calls absolutely brutal from a speed perspective when using a network
drive (that's why the stat caching added to the import system in 3.3
dramatically sped up the case of having network drives on sys.path, and why
I thought AJ had a point when he was complaining about the fact we didn't
expose the dirent data from os.listdir)
fwiw, I wouldn't wait for benchmark numbers.
A needless stat call when you've got the information from an earlier API
call is already brutal. It is easy to compute from existing ballparks remote
file server / cloud access: ~100ms, local spinning disk seek+read: ~10ms.
fetch of stat info cached in memory on file server on the local network:
~500us.  You can go down further to local system call overhead which can
vary wildly but should likely be assumed to be at least 10us.
You don't need a benchmark to tell you that adding needless >= 500us-100ms
blocking operations to your program is bad. :)
Agreed, but walking even a moderately large tree over the network can
really hammer home the point that this offers a significant
performance enhancement as the latency of access increases. I've found
that kind of comparison can be eye-opening for folks that are used to
only operating on local disks (even spinning disks, let alone SSDs)
and/or relatively small trees (distro build trees aren't *that* big,
but they're big enough for this kind of difference in access overhead
to start getting annoying).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia