[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Ben Hoyt benhoyt at gmail.com
Thu Nov 15 08:40:57 CET 2012


> That's one way of looking at it. The problem is that you tell if a
> value has been filled or not by having a None value.  But st_mode is
> itself multi-valued, and you don't always get all available
> value. Maybe d_type should be it's own attribute? If readdir returns
> it, we use it as is. If not, then the caller either does the None/stat
> dance or we make it a property that gets filled from the stat
> structure.

I'm inclined to KISS and just let the caller handle it. Many other
things in the "os" module are system dependent, including os.stat(),
so if the stat_results results returned by iterdir_stat() are system
dependent, that's just par for the course. I'm thinking of a docstring
something like:

"""Yield tuples of (filename, stat_result) for each filename in
directory given by "path". Like listdir(), '.' and '..' are skipped.
The values are yielded in system-dependent order.

Each stat_result is an object like you'd get by calling os.stat() on
that file, but not all information is present on all systems, and st_*
fields that are not available will be None.

In practice, stat_result is a full os.stat() on Windows, but only the
"is type" bits of the st_mode field are available on Linux/OS X/BSD.
"""

So in real life, if you're using more than stat.S_ISDIR() of st_mode,
you'll need to call stat separately. But 1) it's quite rare to need eg
the permissions bits in this context, and 2) if you're expecting
st_mode to have that extra stuff your code's already system-dependent,
as permission bits don't mean much on Windows.

But the main point is that what the OS gives you for free is easily available.

-Ben



More information about the Python-ideas mailing list