[Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Janzert janzert at janzert.com
Tue Jul 1 18:06:58 CEST 2014


On 6/26/2014 6:59 PM, Ben Hoyt wrote:
> Rationale
> =========
>
> Python's built-in ``os.walk()`` is significantly slower than it needs
> to be, because -- in addition to calling ``os.listdir()`` on each
> directory -- it executes the system call ``os.stat()`` or
> ``GetFileAttributes()`` on each file to determine whether the entry is
> a directory or not.
>
> But the underlying system calls -- ``FindFirstFile`` /
> ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X --
> already tell you whether the files returned are directories or not, so
> no further system calls are needed. In short, you can reduce the
> number of system calls from approximately 2N to N, where N is the
> total number of files and directories in the tree. (And because
> directory trees are usually much wider than they are deep, it's often
> much better than this.)
>

One of the major reasons for this seems to be efficiently using 
information that is already available from the OS "for free". 
Unfortunately it seems that the current API and most of the leading 
alternate proposals hide from the user what information is actually 
there "free" and what is going to incur an extra cost.

I would prefer an API that simply gives whatever came for free from the 
OS and then let the user decide if the extra expense is worth the extra 
information. Maybe that stat information was only going to be used for 
an informational log that can be skipped if it's going to incur extra 
expense?

Janzert



More information about the Python-Dev mailing list