[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Jim Jewett jimjjewett at gmail.com
Fri Nov 16 02:11:25 CET 2012


On 11/15/12, Mike Meyer <mwm at mired.org> wrote:
> On Nov 15, 2012 2:06 PM, "Ben Hoyt" <benhoyt at gmail.com> wrote:
>>
>> >> """Yield tuples of (filename, stat_result) for each filename in
>> >> directory given by "path". Like listdir(), '.' and '..' are skipped.
>> >> The values are yielded in system-dependent order.

>> >> Each stat_result is an object like you'd get by calling os.stat() on
>> >> that file, but not all information is present on all systems, and st_*
>> >> fields that are not available will be None.

>> >> In practice, stat_result is a full os.stat() on Windows, but only the
>> >> "is type" bits of the st_mode field are available on Linux/OS X/BSD.
>> >> """

> Better would be 'on Posix systems, if st_mode is not None only the type
> bits are valid.' Assuming that the underlying code translates DT_UNKNOWN to
> binding st_mode to None.

The specification allows other fields as well; is it really the case
that *no* filesystem supports them?

Perhaps:

"""Yield tuples of (filename, stat_result) for each file in the "path"
directory, excluding the '.' and '..' entries.

The order of results is arbitrary, and the effect of modifying a
directory after generator creation is filesystem-dependent.

Each stat_result is similar to the result of os.stat(filename), except
that only the directory entry itself is examined; any attribute which
would require a second system call (even os.stat) is set to None.

In practice, Windows will typically fill in all attributes; other
systems are most likely to fill in only the "is type" bits, or even
nothing at all.
"""

-jJ



More information about the Python-ideas mailing list