[Python-3000] Removal of os.path.walk

Mike Meyer mwm at mired.org
Wed Apr 30 17:10:59 CEST 2008


On Wed, 30 Apr 2008 08:02:28 -0700 "Guido van Rossum" <guido at python.org> wrote:

> On Wed, Apr 30, 2008 at 7:48 AM, Aahz <aahz at pythoncraft.com> wrote:
> >
> > On Tue, Apr 29, 2008, Guido van Rossum wrote:
> >  > On Tue, Apr 29, 2008 at 8:10 PM, Tim Heaney <theaney at gmail.com> wrote:
> >  >>
> >  >> Speaking of this, is it too late to lobby for an iterator version of
> >  >>  os.listdir? (Perhaps listdir would not be the best name. :)
> >  >>
> >  >>  There is one at
> >  >>
> >  >>   http://wxidle.sourceforge.net/projects/xlistdir/
> >  >>
> >  >>  but I think it ought to be in the standard library. Moreover, if we
> >  >>  had such a thing, shouldn't os.walk use it instead of lists?
> >  >
> >  > I'm not sure I see the advantage of having it as an iterator; I doubt
> >  > that there is ever not enough memory to hold the contents of a single
> >  > directory. Do you have a compelling use case?
> >
> >  There's a big difference between "not enough memory" and "directory
> >  consumes lots of memory".  My company has some directories with several
> >  hundred thousand entries, so using an iterator would be appreciated
> >  (although by the time we upgrade to Python 3.x, we probably will have
> >  fixed that architecture).
> >
> >  But even then, we're talking tens of megabytes at worst, so it's not a
> >  killer -- just painful.
> 
> Wow. And the filesystem isn't impossibly slow when accessing the last
> file in such a directory?

Modern file system hash directory entries, so access time by name is
essentially O(1) out to well beyond 50K files in a directory.

> Anyway, I'd be fine with a separate os.opendir() call that returns an
> iterator. The iterator object should also have an optional close()
> method which explicitly frees the underlying file descriptor (or
> whatever is used on Windows).

I think the real win here will be on file systems that return the
files in some well-defined order. If you have to process them all, you
can save on memory, but if you can use the order to skip looking at
some of them completely, that's save disk I/O.  Since this is
file-system dependent, it would be nice if os.opendir() was required
to preserve the ordering semantics (if any) of the underlying system.


	  <mike
-- 
Mike Meyer <mwm at mired.org>		http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.


More information about the Python-3000 mailing list