[Python-3000] Removal of os.path.walk
"Martin v. Löwis"
martin at v.loewis.de
Thu May 1 00:50:43 CEST 2008
> There's a big difference between "not enough memory" and "directory
> consumes lots of memory". My company has some directories with several
> hundred thousand entries, so using an iterator would be appreciated
> (although by the time we upgrade to Python 3.x, we probably will have
> fixed that architecture).
>
> But even then, we're talking tens of megabytes at worst, so it's not a
> killer -- just painful.
But what kind of operation do you want to perform on that directory?
I would expect that usually, you either
a) refer to a single file, which you are either going to create, or
want to process. In that case, you know the name in advance, so
you open/stat/mkdir/unlink/rmdir the file, without caring how
many files exist in the directory,
or
b) need to process all files, to count/sum/backup/remove them;
in this case, you will need the entire list in the process,
and reading them one-by-one is likely going to slow down
the entire operation, instead of speeding it up.
So in no case, you actually need to read the entries incrementally.
That the C APIs provide chunk-wise processing is just because
dynamic memory management is so painful to write in C that the
caller is just asked to pass a limited-size output buffer, which then
gets refilled in subsequent read calls. Originally, the APIs would
return a single entry at a time from the file system, which was
super-slow. Today, SysV all-singing all-dancing getdents provides
multiple entries at a time, for performance reasons.
Regards,
Martin
More information about the Python-3000
mailing list