[Python-ideas] BetterWalk, a better and faster os.walk() for Python
Ben Hoyt
benhoyt at gmail.com
Thu Nov 22 12:39:42 CET 2012
In the recent thread I started called "Speed up os.walk()..." [1] I
was encouraged to create a module to flesh out the idea, so I present
you with BetterWalk:
https://github.com/benhoyt/betterwalk#readme
It's basically all there, and works on Windows, Linux, and Mac OS X.
It probably works on FreeBSD too, but I haven't tested that. I also
haven't written thorough unit tests yet, but intend to after some
further feedback.
In terms of the API for iterdir_stat(), I settled on the more explicit
"pass in what stat fields you want" (the 'fields' parameter). I also
added a 'pattern' parameter to allow you to make use of the wildcard
matching that FindFirst/FindNext provide (it's useful for globbing on
POSIX too, but not a performance improvement).
As for benchmarks, it's about what I saw earlier on Windows (2-6x on
recent versions, depending). My initial tests on Mac OS X show it's
5-10x as fast on that platform! I haven't double-checked those results
yet though.
The results on Linux were somewhat disappointing -- only a 10% speed
improvement on large directories, and it's actually slower on small
directories. It's still doing half the number of system calls ... so I
believe this is because cached os.stat() is super fast on Linux, and
so the slowdown from using ctypes / pure Python is outweighing the
gain from not doing the system call. That said, I've also only tested
Linux in a VirtualBox setup, so maybe that's affecting it too.
Still, if it's a significant win for Windows and OS X users, it's a good thing.
In any case, I'd love it if folks could run the benchmark on their
system (with and without -s) and comment further on the idea and API.
Thanks,
Ben.
[1] http://mail.python.org/pipermail/python-ideas/2012-November/017770.html
More information about the Python-ideas
mailing list