[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Mike Meyer mwm at mired.org
Thu Nov 15 11:29:24 CET 2012


On Nov 15, 2012 1:40 AM, "Ben Hoyt" <benhoyt
<benhoyt at gmail.com>@<benhoyt at gmail.com>
gmail.com <benhoyt at gmail.com>> wrote:
>
> > That's one way of looking at it. The problem is that you tell if a
> > value has been filled or not by having a None value.  But st_mode is
> > itself multi-valued, and you don't always get all available
> > value. Maybe d_type should be it's own attribute? If readdir returns
> > it, we use it as is. If not, then the caller either does the None/stat
> > dance or we make it a property that gets filled from the stat
> > structure.
>
> I'm inclined to KISS and just let the caller handle it. Many other
> things in the "os" module are system dependent, including os.stat(),
> so if the stat_results results returned by iterdir_stat() are system
> dependent, that's just par for the course. I'm thinking of a docstring
> something like:
>
> """Yield tuples of (filename, stat_result) for each filename in
> directory given by "path". Like listdir(), '.' and '..' are skipped.
> The values are yielded in system-dependent order.
>
> Each stat_result is an object like you'd get by calling os.stat() on
> that file, but not all information is present on all systems, and st_*
> fields that are not available will be None.
>
> In practice, stat_result is a full os.stat() on Windows, but only the
> "is type" bits of the st_mode field are available on Linux/OS X/BSD.
> """

There's a code smell here, in that the doc for Unix variants is incomplete
and wrong. Whether or not you get the d_type values depends on the OS
having that extension. Further, there's a d_type value (DT_UNKNOWN) that
isn't a valid value for the S_IFMT bits in st_mode (at least on BSD).

> So in real life, if you're using more than stat.S_ISDIR() of st_mode,
> you'll need to call stat separately. But 1) it's quite rare to need eg
> the permissions bits in this context, and 2) if you're expecting
> st_mode to have that extra stuff your code's already system-dependent,
> as permission bits don't mean much on Windows.
>
> But the main point is that what the OS gives you for free is easily
available.

If the goal is to make os.walk fast, then it might be better (on Posix
systems, anyway) to see if it can be built on top of ftw instead of
low-level directory scanning routines.

    <mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121115/e4a49bda/attachment.html>


More information about the Python-ideas mailing list