[Python-ideas] os.listdir iteration support

Aahz aahz at pythoncraft.com
Sun Nov 25 01:29:17 CET 2007


On Fri, Nov 23, 2007, Giampaolo Rodola' wrote:
>
> Surely it's a rather specific use case, but it is one of the tasks
> which takes the longest amount of time on an FTP server. 20,000 is
> probably an exaggerated hypothetical situation, so I did a simple test
> with a more realistic scenario.
> On windows a very crowded directory is C:\windows\system32. Currently
> the C:\windows\system32 of my Windows XP workstation contains 2201
> files.
> I tried to run the code below which is how an FTP server should
> properly respond to a "LIST" command issued by client.
> It took 1.70300006866 seconds to complete the first time and
> 0.266000032425 the second one.

Your code calls os.stat() on each file.  I know from past experience
that os.stat() is *extremely* expensive.  Because os.listdir() runs at C
speed, it only gets slow when run against hundreds of thousands of
entries.

(One directory on a work server has over 200K entries, and it takes
os.listdir() about twenty seconds.  I believe that if we switched from
ext3 to something more appropriate that would get reduced.)

> I don't know if such specific use case could justify a listdir
> generators support to have into the stdlib but having something like
> Greg Ewing's opendirs module could have saved a lot of time in this
> specific case.

Doubtful.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"Typing is cheap.  Thinking is expensive."  --Roy Smith



More information about the Python-ideas mailing list