Python vs. Ruby (and os.path.walk)
peter at engcorp.com
Fri Aug 9 22:04:45 EDT 2002
Steven Atkinson wrote:
> Then I took out all screen IO and really sped the code up. I put
> the IO back in and now the code is still acceptable in terms of
> speed (though slightly slower than Ruby).
> The original code was:
> I changed it to:
> The second is faster. They are both decent. Thanks again.
> Signed plerplexed and embarrassed.
A few items:
1. The timing is rather dependent on the mix of directories
and files, and matching files. The os.path.walk routine
runs isdir() on every name it finds, so that can take a lot
of time, even if there are few directories to search, if
there are a lot of files.
2. The second one is actually significantly faster than
the first one, in my particular system. (I changed the
search pattern of course, and the directory, and I'm
running under either Linux or Win98.) In both cases
my search actually finds only a few files out of the
many thousands that are there. The difference in speed
is roughly 2x (because most of lister() is skipped
most of the time in my case, so it's all from the walk()
routine. YMWV...your mileage _will_ vary :-) )
3. Matt's comment about using the profiler is of course
the only right way to go about optimizing. It's simpler
than you might think, if you haven't used it already:
Normally you should try running the code a second time
to allow you to measure and factor in (or out) any extra
effects like hard drive caching. In that case:
..is better. It's also easier to just repeat the second
line over and over while you tweak the code in walk2.py
in a text editor, for example.
4. You probably want to insert a backslash in front of
the dots "." in the file extensions in the regular
expression. Otherwise you're matching on any character,
not on a period...
More information about the Python-list