[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Nick Coghlan ncoghlan at gmail.com
Fri Nov 9 13:23:38 CET 2012


On Fri, Nov 9, 2012 at 8:29 PM, Ben Hoyt <benhoyt at gmail.com> wrote:

> Anyway, cutting a long story short -- do folks think 1) is a good idea?
> What
> about some of the thoughts in 2)? In either case, what would be the best
> way
> to go further on this?
>

It's even worse when you add NFS (and other network filesystems) into the
mix, so yes, +1 on devising a more efficient API design for bulk stat
retrieval than the current listdir+explicit-stat approach that can lead to
an excessive number of network round trips.

It's a complex enough idea that it definitely needs some iteration outside
the stdlib before it could be added, though.

You could either start exploring this as a new project, or else if you
wanted to fork my walkdir project on BitBucket I'd be interested in
reviewing any pull requests you made along those lines - redundant stat
calls are currently one of the issues with using walkdir for more complex
tasks. (However you decide to proceed, you'll need to set things up to
build an extension module, though - walkdir is pure Python at this point).

Another alternative you may want to explore is whether or not Antoine
Pitrou would be interested in adding such a capability to his pathlib
module. pathlib already includes stat result caching in Path objects, and
thus may be able to support a clean API for returning path details with the
stat results precached.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121109/72fa6505/attachment.html>


More information about the Python-ideas mailing list