[Python-ideas] BetterWalk, a better and faster os.walk() for Python

John Mulligan phlogistonjohn at asynchrono.us
Sun Nov 25 13:54:34 CET 2012


On Saturday, November 24, 2012 03:27:09 PM Andrew Barnert wrote:
> First, another thought on the whole thing:
> 
> Wrapping readdir is useful. Replacing os.walk is also useful. But they don't
> necessarily have to be tied together at all.
> 
> In particular, instead of trying to write an iterdir_stat that can properly
> support os.walk on all platforms, why not just implement os.walk differently
> on platforms where iterdir_stat can't support it? (In fact, I think an
> os.walk replacement based on the fts API, which never uses iterdir_stat,
> would be the best answer, but let me put more thought into that idea...)


Agreed, keeping things separate might be a better approach. I wanted to point 
out the usefulness of an enhanced listdir/iterdir as its own beast in addition 
to improving os.walk. 

There is one thing that is advantageous about creating an ideal enhanced 
os.walk. People would only have to change the module walk is getting imported 
from, no changes would have to be made anywhere else even if that code is 
using features like the ability to modify dirnames (when topdown=True).
I am not sure if fts or other platform specific API could be wrangled into an 
exact drop in replacement.

> 
> Anyway, comments:
> 
> From: John Mulligan <phlogistonjohn at asynchrono.us>
> Sent: Fri, November 23, 2012 8:13:22 AM
> 
> > I like returning the d_type directly because in  the unix style APIs the
> > dirent structure doesn't provide the same stuff as  the stat result and I
> > don't want to trick myself into thinking I have all  the information
> > available from the readdir call. I also like to have my  Python functions
> > map pretty closely to the C calls.
> 
> Of course that means that implementing the same interface on Windows means
> faking d_type from the stat result, and making the functions map less
> closely to the C calls…

I agree, I don't know if it would be better to simply have platform dependent 
fields/values in the struct or if it is better to abstract things in this case. 
Anyway, the betterwalk code is already converting constants from the Windows 
API to mode values. Something similar might be possible for d_type values as 
well.

See: https://github.com/benhoyt/betterwalk/blob/master/betterwalk.py#L62


> 
> > In addition I have a fditerdir call that supports a directory file
> > descriptor as the first argument. This is handy because I also have a
> > wrapper for fstatat (this was all created for Python 2 and before 3.3
> > was released).
> 
> This can only be implemented on platforms that support the *at functions. I
> believe that means just linux and OpenBSD right now, other *BSD (including
> OS X) at some unspecified point in the future. Putting something like that
> in the stdlib would probably require also adding another function like
> os_supports_at (similar to supports_fd, supports_dirfd, etc.), but that's
> not a big deal.

I agree that this requires supporting platforms. (I've run this on FreeBSD as 
well.) I didn't mean to imply that this should be required for a better walk 
function. I wanted to provide some color about the value of exposing alternate 
listdir-type functions themselves and not just as a stepping stone on the way 
to enhancing walk.




More information about the Python-ideas mailing list