[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Mike Meyer mwm at mired.org
Thu Nov 15 06:03:21 CET 2012


On Nov 14, 2012 10:19 PM, "Jim Jewett" <jimjjewett at gmail.com> wrote:
>
> On 11/14/12, Mike Meyer <mwm at mired.org> wrote:
> > On Wed, Nov 14, 2012 at 5:51 PM, Jim Jewett <jimjjewett at gmail.com>
wrote:
> >> On 11/12/12, Ben Hoyt <benhoyt at gmail.com> wrote:
>
> >> (c)  Attributes will default to None, supporting the "if x is None:
> >> x=stat()" pattern for the users who do care about attributes that were
> >> not available quickly.  ...
>
> > Two questions:
>
> > 1) Is there some way to distinguish that your st_mode field is only
> > partially there (i.e. - you get the Linux/BSD d_type value, but not
> > the rest of st_mode)?
>
> os.iterdir did not call stat; you have partial information.

Note that you're eliding the proposal these questions were about, that
os.iterdir return some kind of object that had attributes that carried the
stat values, or None if they weren't available.

> Or are you saying that you want to distinguish between "This
> filesystem doesn't track that information", "This process couldn't get
> that information right now", and "That particular piece of information
> requires a second call that hasn't been made yet"?

I want to distinguish between the case where st_mode is filled from the
BSD/Unix d_type directory entry - meaning there is information so st_mode
is not None, but the information is incomplete and requires a second system
call to fetch - and the case where it's filled via the Windows calls which
provide all the information that is available for st_mode, so no second
system call is needed.

> > 2) How about making these attributes properties, so that touching one
> > that isn't there causes them all to be populated.
> Part of the motivation was to minimize extra system calls; that
> suggests making another one should be a function call instead of a
> property.

Except that I don't see that there's anything to do once you've found a
None-valued attribute *except* make that extra call. If there's a use case
where you find one of the attributes is None and then not get the value
from the system, I agree with you. If there isn't, then you might as well
roll that one use case into the object rather than force every client to do
the stat call and extract the information from it in that case.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121114/c9f70611/attachment.html>


More information about the Python-ideas mailing list