[Python-ideas] BetterWalk, a better and faster os.walk() for Python

Ben Hoyt benhoyt at gmail.com
Thu Nov 22 12:39:42 CET 2012


In the recent thread I started called "Speed up os.walk()..." [1] I
was encouraged to create a module to flesh out the idea, so I present
you with BetterWalk:

https://github.com/benhoyt/betterwalk#readme

It's basically all there, and works on Windows, Linux, and Mac OS X.
It probably works on FreeBSD too, but I haven't tested that. I also
haven't written thorough unit tests yet, but intend to after some
further feedback.

In terms of the API for iterdir_stat(), I settled on the more explicit
"pass in what stat fields you want" (the 'fields' parameter). I also
added a 'pattern' parameter to allow you to make use of the wildcard
matching that FindFirst/FindNext provide (it's useful for globbing on
POSIX too, but not a performance improvement).

As for benchmarks, it's about what I saw earlier on Windows (2-6x on
recent versions, depending). My initial tests on Mac OS X show it's
5-10x as fast on that platform! I haven't double-checked those results
yet though.

The results on Linux were somewhat disappointing -- only a 10% speed
improvement on large directories, and it's actually slower on small
directories. It's still doing half the number of system calls ... so I
believe this is because cached os.stat() is super fast on Linux, and
so the slowdown from using ctypes / pure Python is outweighing the
gain from not doing the system call. That said, I've also only tested
Linux in a VirtualBox setup, so maybe that's affecting it too.

Still, if it's a significant win for Windows and OS X users, it's a good thing.

In any case, I'd love it if folks could run the benchmark on their
system (with and without -s) and comment further on the idea and API.

Thanks,
Ben.

[1] http://mail.python.org/pipermail/python-ideas/2012-November/017770.html



More information about the Python-ideas mailing list