[Python-Dev] os.walk() is going to be *fast* with scandir
robertc at robertcollins.net
Sun Aug 10 07:40:47 CEST 2014
A small tip from my bzr days - cd into the directory before scanning
it - especially if you'll end up statting more than a fraction of the
files, or are recursing - otherwise the VFS does a traversal for each
path you directly stat / recurse into. This can become a dominating
factor in some workloads (I shaved several hundred milliseconds off of
bzr stat on kernel trees doing this).
On 10 August 2014 15:57, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 10 August 2014 13:20, Antoine Pitrou <antoine at python.org> wrote:
>> Le 09/08/2014 12:43, Ben Hoyt a écrit :
>>> Just thought I'd share some of my excitement about how fast the all-C
>>> version  of os.scandir() is turning out to be.
>>> Below are the results of my scandir / walk benchmark run with three
>>> different versions. I'm using an SSD, which seems to make it
>>> especially faster than listdir / walk. Note that benchmark results can
>>> vary a lot, depending on operating system, file system, hard drive
>>> type, and the OS's caching state.
>>> Anyway, os.walk() can be FIFTY times as fast using os.scandir().
>> Very nice results, thank you :-)
> This may actually motivate me to start working on a redesign of
> walkdir at some point, with scandir and DirEntry objects as the basis.
> My original approach was just too slow to be useful in practice (at
> least when working with trees on the scale of a full Fedora or RHEL
> build hosted on an NFS share).
> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
> Python-Dev mailing list
> Python-Dev at python.org
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/robertc%40robertcollins.net
Robert Collins <rbtcollins at hp.com>
HP Converged Cloud
More information about the Python-Dev