[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()
Andrew Barnert
abarnert at yahoo.com
Fri Nov 16 13:09:49 CET 2012
> From: Mike Meyer <mwm at mired.org>
> Sent: Fri, November 16, 2012 2:45:15 AM
>
> On Fri, Nov 16, 2012 at 4:32 AM, Andrew Barnert <abarnert at yahoo.com> wrote:
> > From: Mike Meyer <mwm at mired.org>
> > Sent: Thu, November 15, 2012 2:29:44 AM
> >
> > Passing FTS_NOSTAT to fts is about 3x faster, but only 8% faster than
>os.walk
> > with the stat calls hacked out, and 40% slower than find.
>
> That's actually a good thing to know. With FTS_NOSTAT, fts winds up
> using the d_type field to figure out what's a directory (assuming you
> were on a file system that has those). That's the proposed change for
> os.walk, so we now have an estimate of how fast we can expect it to
> be.
I'm not sure I'd put too much confidence on the 3x difference as generally
applicable to POSIX. Apple uses FreeBSD's fts unmodified, even though in a quick
browser I saw at least one case where a trivial change would have made a
difference (the link count check that's only used with ufs/nfs/nfs4/ext2fs would
also work on hfs+). Also, OS X with HFS+ drives does some bizarre stuff with
disk caching, especially with an SSD (which in itself probably changes the
performance characteristics).
But I'd guess it's somewhere in the right ballpark, and if anything it'll
probably be even more improvement on FreeBSD and linux than on OS X.
> I'm surprised that it's slower than find. The FreeBSD version of find
> uses fts_open/fts_read. Could it be that used FTS_NOCHDIR to emulate
> os.walk, whereas find doesn't?
No, just FTS_PHYSICAL (with or without FTS_NOSTAT).
It looks like more than half of the difference is due to
print(ent.fts_path.decode('utf8')) in Python vs. puts(entry->fts_path) in find
(based on removing the print entirely). I don't think it's worth the effort to
investigate further—let's get the 3x faster before we worry about the last 40%…
But if you want to, the source I used is at https://github.com/abarnert/py-fts
More information about the Python-ideas
mailing list