peter at engcorp.com
Sat Oct 30 17:10:02 CEST 2004
Reinhold Birkenfeld wrote:
> Well, commenting out the regex substitutions in both versions leads to
> equal execution times, as I mentioned earlier, so it has to be my regexes.
I'm sorry, I managed to miss the significance of that
statement in your original post.
I wonder what the impact is of the fact that the re.sub
operations are function calls in Python. The overhead
of function calls is relatively high in Python, so
perhaps an experiment would be revealing. Can you try
with a dummy re.sub() call (create a dummy "re" object
that is global to your module, with a dummy .sub()
method that just does "return") and compare the
speed of that with the version without the re.sub
calls at all? Probably a waste of time, but perhaps
the actual re operations are not so slow after all,
but the calls themselves are.
If that's true, you would at least get a tiny improvement
by alias re.sub to a local name before the loop, to
avoid the global lookup for "re" and then the attribute
lookup for "sub" on each of the three calls, each time
through the loop.
If you can show the format of the input data, I would
be happy to try a little profiling, if you haven't already
done that to prove that the bulk of the time is actually
in the re.sub operation itself.
More information about the Python-list