[Python-3000] Removal of os.path.walk

"Martin v. Löwis" martin at v.loewis.de
Thu May 1 00:50:43 CEST 2008


> There's a big difference between "not enough memory" and "directory
> consumes lots of memory".  My company has some directories with several
> hundred thousand entries, so using an iterator would be appreciated
> (although by the time we upgrade to Python 3.x, we probably will have
> fixed that architecture).
> 
> But even then, we're talking tens of megabytes at worst, so it's not a
> killer -- just painful.

But what kind of operation do you want to perform on that directory?

I would expect that usually, you either

a) refer to a single file, which you are either going to create, or
   want to process. In that case, you know the name in advance, so
   you open/stat/mkdir/unlink/rmdir the file, without caring how
   many files exist in the directory,
or

b) need to process all files, to count/sum/backup/remove them;
   in this case, you will need the entire list in the process,
   and reading them one-by-one is likely going to slow down
   the entire operation, instead of speeding it up.

So in no case, you actually need to read the entries incrementally.

That the C APIs provide chunk-wise processing is just because
dynamic memory management is so painful to write in C that the
caller is just asked to pass a limited-size output buffer, which then
gets refilled in subsequent read calls. Originally, the APIs would
return a single entry at a time from the file system, which was
super-slow. Today, SysV all-singing all-dancing getdents provides
multiple entries at a time, for performance reasons.

Regards,
Martin


More information about the Python-3000 mailing list