Regex speed

Reinhold Birkenfeld reinhold-birkenfeld-nospam at wolke7.net
Sat Oct 30 17:14:24 CEST 2004


Peter Hansen wrote:
> Reinhold Birkenfeld wrote:
>> Well, commenting out the regex substitutions in both versions leads to
>> equal execution times, as I mentioned earlier, so it has to be my regexes.
> 
> I'm sorry, I managed to miss the significance of that
> statement in your original post.

;)

> I wonder what the impact is of the fact that the re.sub
> operations are function calls in Python.  The overhead
> of function calls is relatively high in Python, so
> perhaps an experiment would be revealing.  Can you try
> with a dummy re.sub() call (create a dummy "re" object
> that is global to your module, with a dummy .sub()
> method that just does "return") and compare the
> speed of that with the version without the re.sub
> calls at all?  Probably a waste of time, but perhaps
> the actual re operations are not so slow after all,
> but the calls themselves are.
> 
> If that's true, you would at least get a tiny improvement
> by alias re.sub to a local name before the loop, to
> avoid the global lookup for "re" and then the attribute
> lookup for "sub" on each of the three calls, each time
> through the loop.
> 
> If you can show the format of the input data, I would
> be happy to try a little profiling, if you haven't already
> done that to prove that the bulk of the time is actually
> in the re.sub operation itself.

Well, I did alias the sub methods in that way:

re1sub = re.compile("whatever").sub

There was a performance gain, but it was about 1/100th of the speed
difference.

Reinhold

-- 
[Windows ist wie] die Bahn: Man muss sich um nichts kuemmern, zahlt fuer
jede Kleinigkeit einen Aufpreis, der Service ist mies, Fremde koennen
jederzeit einsteigen, es ist unflexibel und zu allen anderen Verkehrs-
mitteln inkompatibel.               -- Florian Diesch in dcoulm



More information about the Python-list mailing list