[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()
Andrew Barnert
abarnert at yahoo.com
Fri Nov 16 11:32:36 CET 2012
From: Mike Meyer <mwm at mired.org>
Sent: Thu, November 15, 2012 2:29:44 AM
>If the goal is to make os.walk fast, then it might be better (on Posix systems,
>anyway) to see if it can be built on top of ftw instead of low-level directory
>scanning routines.
After a bit of experimentation, I'm not sure there actually is any significant
improvement to be had on most POSIX systems working that way.
Looking at the source from FreeBSD, OS X, and glibc, they all call stat (or a
stat family call) on each file, unless you ask for no stat info. A quick test on
OS X shows that calling fts via ctypes is about 5% faster than os.walk, and 5%
slower than find -ls or find -mtime (which will stat every file).
Passing FTS_NOSTAT to fts is about 3x faster, but only 8% faster than os.walk
with the stat calls hacked out, and 40% slower than find.
So, a "nostat" option is a potential performance improvement, but switching to
ftw/nftw/fts, with or without the nostat flag, doesn't seem to be worth it.
More information about the Python-ideas
mailing list