[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Andrew Barnert abarnert at yahoo.com
Fri Nov 16 13:09:49 CET 2012


> From: Mike Meyer <mwm at mired.org>
> Sent: Fri, November 16, 2012 2:45:15 AM
> 
> On Fri, Nov 16, 2012 at 4:32 AM, Andrew Barnert <abarnert at yahoo.com> wrote:
> > From:  Mike Meyer <mwm at mired.org>
> > Sent: Thu, November  15, 2012 2:29:44 AM
> >
> > Passing FTS_NOSTAT to fts is about 3x faster, but only 8%  faster than 
>os.walk
> > with the stat calls hacked out, and 40% slower than  find.
> 
> That's actually a good thing to know. With FTS_NOSTAT, fts winds  up
> using the d_type field to figure out what's a directory (assuming  you
> were on a file system that has those). That's the proposed change  for
> os.walk, so we now have an estimate of how fast we can expect it  to
> be.

I'm not sure I'd put too much confidence on the 3x difference as generally 
applicable to POSIX. Apple uses FreeBSD's fts unmodified, even though in a quick 
browser I saw at least one case where a trivial change would have made a 
difference (the link count check that's only used with ufs/nfs/nfs4/ext2fs would 
also work on hfs+). Also, OS X with HFS+ drives does some bizarre stuff with 
disk caching, especially with an SSD (which in itself probably changes the 
performance characteristics).

But I'd guess it's somewhere in the right ballpark, and if anything it'll 
probably be even more improvement on FreeBSD and linux than on OS X.

> I'm surprised that it's slower than find. The FreeBSD version  of find
> uses fts_open/fts_read. Could it be that used FTS_NOCHDIR to  emulate
> os.walk, whereas find doesn't?


No, just FTS_PHYSICAL (with or without FTS_NOSTAT).

It looks like more than half of the difference is due to 
print(ent.fts_path.decode('utf8')) in Python vs. puts(entry->fts_path) in find 
(based on removing the print entirely). I don't think it's worth the effort to 
investigate further—let's get the 3x faster before we worry about the last 40%… 
But if you want to, the source I used is at https://github.com/abarnert/py-fts



More information about the Python-ideas mailing list