Python vs. Ruby (and os.path.walk)

Peter Hansen peter at
Fri Aug 9 22:04:45 EDT 2002

Steven Atkinson wrote:

> Then I took out all screen IO and really sped the code up. I put 
> the IO back in and now the code is still acceptable in terms of 
> speed (though slightly slower than Ruby).
> The original code was:
> -----------------------------

> I changed it to:
> ------------------------
> The second is faster. They are both decent. Thanks again.
> Signed plerplexed and embarrassed.

A few items:

1. The timing is rather dependent on the mix of directories
and files, and matching files.  The os.path.walk routine
runs isdir() on every name it finds, so that can take a lot
of time, even if there are few directories to search, if
there are a lot of files.

2. The second one is actually significantly faster than
the first one, in my particular system.  (I changed the
search pattern of course, and the directory, and I'm 
running under either Linux or Win98.)  In both cases
my search actually finds only a few files out of the 
many thousands that are there.  The difference in speed
is roughly 2x (because most of lister() is skipped 
most of the time in my case, so it's all from the walk()
routine.  YMWV...your mileage _will_ vary :-) )

3. Matt's comment about using the profiler is of course
the only right way to go about optimizing.  It's simpler
than you might think, if you haven't used it already:

  import profile'import walk2')

Normally you should try running the code a second time
to allow you to measure and factor in (or out) any extra
effects like hard drive caching.  In that case:

  import walk1
  import profile'reload(walk2)') better.  It's also easier to just repeat the second
line over and over while you tweak the code in
in a text editor, for example.

4. You probably want to insert a backslash in front of
the dots "." in the file extensions in the regular 
expression.  Otherwise you're matching on any character,
not on a period...


More information about the Python-list mailing list