[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Ben Hoyt benhoyt at gmail.com
Wed Nov 14 08:22:40 CET 2012


>> Yes, it's slightly odd, but not as odd as you'd think. This is
>> especially true for Windows users, because we're used to st_mode only
>> being a subset of the information -- the permission bits are basically
>> meaningless on Windows.
>
> That's one more reason for returning a new tuple/struct with a type field:
> the full st_mode is not useful on Windows, and on Unix readdir doesn't
> return a full st_mode in the first place.

Hmmm, I'm not sure I agree: st_mode from the new iterdir_stat() will
be as useful as that currently returned by os.stat(), and it is very
useful (mainly for checking whether an entry is a directory or not).
You're right that it won't return a full st_mode on Linux/BSD, but I
think it's better for folks to use the existing "if
stat.S_ISDIR(st.st_mode): ..." idiom than introduce a new thing.

> How did you measure the 5x speedup you saw with you modified os.walk?

Just by os.walk()ing through a large directory tree with basically
nothing in the inner loop, and comparing that to the same thing with
my version.

> It would be interesting to see if Unix platforms have a simular speedup, because
> if they don't the new API could just return the results of stat (or lstat ...).

Yeah, true. I'll do that and post results in the next few days when I
get it done. I'm *hoping* for a similar speedup there too, given the
increase in system calls, but you never know till you benchmark ...
maybe system calls are much faster on Linux, or stat() is cached
better or whatever.

-Ben



More information about the Python-ideas mailing list