[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Vinay Sajip vinay_sajip at yahoo.co.uk
Fri Nov 9 21:30:10 CET 2012


Ben Hoyt <benhoyt at ...> writes:

> I've written a proof-of-concept (see [1] below) using ctypes and
> FindFirst/FindNext on Windows, showing that for sizeable directory trees it
> gives a 4x to 6x speedup -- so this is not a micro-optimization!
> 
> I started trying the same thing with opendir/readdir on Linux, but don't have
> as much experience there, and wanted to get some feedback on the concept
> first. I assume it'd be a similar speedup by using d_type & DT_DIR from
> readdir().
> 
> The problem is even worse when you're calling os.walk() and then doing your
> own stat() on each file, for example, to get the total size of all files in a
> tree -- see [2]. It means it's calling stat() twice on every file, and I see
> about a 9x speedup in this scenario using the info FindFirst/Next provide.

Sounds good. I recently answered a Stack Overflow question [1] which showed
Python performing an order of magnitude slower than Ruby. Ruby's Dir
implementation is written in C and less flexible than os.walk, but there's room
for improvement, as you've shown.

Regards,

Vinay Sajip


[1]
http://stackoverflow.com/questions/13138160/benchmarks-does-python-have-a-faster-way-of-walking-a-network-folder




More information about the Python-ideas mailing list