[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Ben Hoyt benhoyt at gmail.com
Sun Nov 18 22:00:56 CET 2012


>>> 1) wrap the return partial stat info in a proxy object
>>> 2) Make iterdir_stat an os.walk internal tool, and don't export it.
>>> 3) Add some kind of "we have a full stat" indicator,
>>> 4) document one of the a stat values as a "we have a full stat"
>>> indicator,
>>> 5) Add a keyword argument to ... always do the full stat.
>>> 6) Depreciate os.walk, and provide os.itertree

I don't love most of these solutions as they seem to complicate
things. I'd be happy with 2) if it came to it -- but I think it's a
very useful tool, especially on Windows, because it means Windows
users would have "pure Python access" to FindFirst/FindNext and a very
good speed improvement for os.walk.

>> 7) Provide an iterdir() with a way of specifying exactly
>> which stat fields you're interested in. Then it can perform
>> stat calls if and only if needed, and the user doesn't have
>> to tediously test for presence of things in the result.
>>
>
> +1 for following that seventh path. It offers the additional benefit for the
> library code, that constraints of the backend functionality used are more
> clearer to handle: If requested and available allthough expensive, "yield
> nevertheless the attribute values" is then a valid strategy.

Ah, that's an interesting idea. It's nice and explicit. I'd go for
either the status quo I've proposed or this one. Though only thing I'd
want to push for is a clean API. Any suggestions? Mine would be
something like:

for filename, st in iterdir_stat(path, stat_fields=['st_mode', 'st_size']:
    ...

However, you might also need 'd_type' field option, because eg for
os.walk() you don't actually need all of st_mode, just the type info.
This is odd (that most of the fields are named "st_*" but one
"d_type") but not terrible.

-Ben



More information about the Python-ideas mailing list