[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Mike Meyer mwm at mired.org
Fri Nov 16 11:45:13 CET 2012


On Fri, Nov 16, 2012 at 4:32 AM, Andrew Barnert <abarnert at yahoo.com> wrote:
> From: Mike Meyer <mwm at mired.org>
> Sent: Thu, November 15, 2012 2:29:44 AM
>
>>If the goal is to make os.walk fast, then it might be better (on Posix systems,
>
>>anyway) to see if it can be built on top of ftw instead of low-level directory
>>scanning routines.
> After a bit of experimentation, I'm not sure there actually is any significant
> improvement to be had on most POSIX systems working that way.

I agree with that, so long as you have to stay with the os.walk
interface.

> Looking at the source from FreeBSD, OS X, and glibc, they all call stat (or a
> stat family call) on each file, unless you ask for no stat info.

Right. They either they give you *all* the stat information, or they
don't give you *any* of it. So there's no way to use it to create the
directory/other split in os.walk without doing the stat calls.

> Passing FTS_NOSTAT to fts is about 3x faster, but only 8% faster than os.walk
> with the stat calls hacked out, and 40% slower than find.

That's actually a good thing to know. With FTS_NOSTAT, fts winds up
using the d_type field to figure out what's a directory (assuming you
were on a file system that has those). That's the proposed change for
os.walk, so we now have an estimate of how fast we can expect it to
be.

I'm surprised that it's slower than find. The FreeBSD version of find
uses fts_open/fts_read. Could it be that used FTS_NOCHDIR to emulate
os.walk, whereas find doesn't?

    <mike



More information about the Python-ideas mailing list