Python vs. Ruby (and os.path.walk)
sja at san.rr.com
Sat Aug 10 17:03:40 CEST 2002
There were other processes, but they were not doing much(really - a 2 gz
Pentium4, running Win2k SP2 with 512 RAM: 2 IE browers on static pages, 2
MSDEV, 1 Outlook, 1 DOS box idle, 2 Bash Shells where I ran the scripts. bug
tracker software. No significant services.). . Basically, a few weeks ago I
started playing with Ruby and wrote the script as a toy to clean midl output
(I already had a script to do this). However, it ran well and was easy to
call so I just kept using it. Then I wanted a script to find out my "dirty"
source files - files that I changed without checking out. I noticed that
Ruby did not have a portable way (or any way) of checking the return value
of stat calls. Python did. So I dusted off my Python that I had played with
6 months ago. And wrote a similar script on my second development box. I got
as far as needing to put in a call to remove the files and needed to look up
the library call. So I just ran it to test what I had. I noticed the huge
speed difference. I ran it again with the same results. I copied the Ruby
script to the same machine while the second run was still going. I ran the
Ruby script. It finished before the Python script - subsequently removing
all the files!!! So I modified the Ruby script to not remove the files.
Generated the files and ran them both again in parrallel. Ruby kicked butt.
I ran them seperately. Ruby first then Python. Same results. Noted what esle
was on the machine. Searched the Internet for references to slow Python
os.path.walk and got a few hits, but not as many as one would think if it
was fundemental. Went home. Posted here. Went to work the next day and
changed the Python script to pass the Regex as a Param to the Func (Ruby did
not have a way to do this and I had just copied it's structure to get the
python code). Ran the scripts again in parrellel using bash's time. Python
2:45. Ruby :15. Made the second tweak to do the regex against the file name.
Ran again. Same results. Took out all IO from the Python script and ran it
alone :15. Wow! Took the IO out of Ruby: :13. Humm. Put the IO back in both
and ran in parrellel. Python :18, Ruby :14. Other runs would have different
times, but Ruby would generally finish a tad faster. Tried to put the code
back as orignally designed and could never get back to 2-3 minutes.
Only thing that might have changed is I probably stopped and started Outlook
overnight. I may have shut the track software at some point. Since I did not
profile, I can't say for sure what happened. My only "guess" is that Python
may have been competing to get some sort of mutex/critical section protected
resource/section of code that another program was also using. I'll probably
continue to try to find the culprit since unsolved mysteries drive me crazy.
(And in the end it'll probably turn out to be something really stupid that I
I'm still leaning toward Python since Ruby seems just a tad awkward to me
(especially since script writing is not a everyday activity for me).
(Probably a LOT more detail than anyone cared about:)
> Are you sure nothing else was running in the background during these
> Much of the time spent running the code is actually I/O related, so maybe
> another process was doing a lot of disk access at the time and you didn't
> > In any case, though that mystery is not solved, at least I know that
> > about the same speed wise as Ruby.
> ^^^^---------note identity function------'
> That much, at least, is true. <wink>
More information about the Python-list