
On 28 June 2014 16:17, Gregory P. Smith <greg@krypto.org> wrote:
On Fri, Jun 27, 2014 at 2:58 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
* it would be nice to see some relative performance numbers for NFS and CIFS network shares - the additional network round trips can make excessive stat calls absolutely brutal from a speed perspective when using a network drive (that's why the stat caching added to the import system in 3.3 dramatically sped up the case of having network drives on sys.path, and why I thought AJ had a point when he was complaining about the fact we didn't expose the dirent data from os.listdir)
fwiw, I wouldn't wait for benchmark numbers.
A needless stat call when you've got the information from an earlier API call is already brutal. It is easy to compute from existing ballparks remote file server / cloud access: ~100ms, local spinning disk seek+read: ~10ms. fetch of stat info cached in memory on file server on the local network: ~500us. You can go down further to local system call overhead which can vary wildly but should likely be assumed to be at least 10us.
You don't need a benchmark to tell you that adding needless >= 500us-100ms blocking operations to your program is bad. :)
Agreed, but walking even a moderately large tree over the network can really hammer home the point that this offers a significant performance enhancement as the latency of access increases. I've found that kind of comparison can be eye-opening for folks that are used to only operating on local disks (even spinning disks, let alone SSDs) and/or relatively small trees (distro build trees aren't *that* big, but they're big enough for this kind of difference in access overhead to start getting annoying). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia