[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Andrew Barnert abarnert at yahoo.com
Fri Nov 16 11:32:36 CET 2012


From: Mike Meyer <mwm at mired.org>
Sent: Thu, November 15, 2012 2:29:44 AM

>If the goal is to make os.walk fast, then it might be better (on Posix systems, 

>anyway) to see if it can be built on top of ftw instead of low-level directory 
>scanning routines.

After a bit of experimentation, I'm not sure there actually is any significant 
improvement to be had on most POSIX systems working that way.

Looking at the source from FreeBSD, OS X, and glibc, they all call stat (or a 
stat family call) on each file, unless you ask for no stat info. A quick test on 
OS X shows that calling fts via ctypes is about 5% faster than os.walk, and 5% 
slower than find -ls or find -mtime (which will stat every file).

Passing FTS_NOSTAT to fts is about 3x faster, but only 8% faster than os.walk 
with the stat calls hacked out, and 40% slower than find.

So, a "nostat" option is a potential performance improvement, but switching to 
ftw/nftw/fts, with or without the nostat flag, doesn't seem to be worth it.



More information about the Python-ideas mailing list