On 6/26/2014 6:59 PM, Ben Hoyt wrote:
Python's built-in ``os.walk()`` is significantly slower than it needs to be, because -- in addition to calling ``os.listdir()`` on each directory -- it executes the system call ``os.stat()`` or ``GetFileAttributes()`` on each file to determine whether the entry is a directory or not.
But the underlying system calls -- ``FindFirstFile`` / ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- already tell you whether the files returned are directories or not, so no further system calls are needed. In short, you can reduce the number of system calls from approximately 2N to N, where N is the total number of files and directories in the tree. (And because directory trees are usually much wider than they are deep, it's often much better than this.)
One of the major reasons for this seems to be efficiently using information that is already available from the OS "for free". Unfortunately it seems that the current API and most of the leading alternate proposals hide from the user what information is actually there "free" and what is going to incur an extra cost.
I would prefer an API that simply gives whatever came for free from the OS and then let the user decide if the extra expense is worth the extra information. Maybe that stat information was only going to be used for an informational log that can be skipped if it's going to incur extra expense?