Python speed and `pcre'

Darrell news at dorb.com
Tue Aug 31 13:25:09 EDT 1999


>From re.py
_MAXCACHE = 20

Definitely do the profile before assuming where the expense is. I've found a
common mistake is splitting and combining strings. Use string.join(), don't
do this str=str+str1 in a loop. Strings are immutable so you end up
allocating memory and coping every time. Very expensive.

--
--Darrell
François Pinard <pinard at iro.umontreal.ca> wrote in message
news:oqzoz89hrb.fsf at titan.progiciels-bpi.ca...
> Hi, people.  Still learning and experimenting with Python. :-)
>
> After having translated some code (not big, but not small) from Perl to
> Python, I discover it runs ten times slower.  I did not learn to profile
> yet, and I am not sure it would help, after having read that profiling
> gives call counts, but no elapsed time.  Is that right?
>
> My intuition tells me that the amount of regular expression matching might
> provide an explanation.  Here is my set of _hypotheses_ (I'm not sure):
>
> * Perl keeps compiled all /REGEXP/ not using string interpolation,
> * Python cache for compiled REGEXP is less virtuous than I imagined.
>
> I would be tempted to try using, in Python, a trick like `gettext' does
> to avoid retranslating in already translated message.  That is, for _each_
> textual `gettext' call, the compiler #define extends some code statically
> allocating of a local variable used to cache the result or the call,
> and use this cached result afterwards.
>
> My problem, and I hope you will have some good advice to give for it:-),
> is that I do not see how to this legibly.  I could regroup a lot of
> `re.compile' calls and use the corresponding variable afterwards, but I
> would much rather keep the REGEXP expressions textual near the place they
> are needed, and have some:
>
>         global gensymed-local
>         if gensymed-local is None:
>     gensymed-local = re.compile(REGEXP)
>         gensymed-local.sub(...)
>
> This would have the advantage of keeping the REGEXP where it is meaningful
> in the code.  Yet, creating and maintaining all those `gensymed-local'
> variables by hand might be fairly tedious, not to say some code to ensure
> they are all None.  (But maybe `global' already guarantees that?  If yes,
> may I count on it, that is, does Python promises it is unlikely to
change?)
>
> Of course, seeing the above clutter, the idea a macro-generator is surely
> tempting, but this would be fairly heavy resorting to external mechanics
> for something which should be straightforward.  If Python included a
> macro-generator, it would be a nice thing...
>
> Is there something fundamental I am missing, and some just simpler code
> that would have the desired effect?  Or else, is there another approach
> that could give me more speed without giving too much on legibility?
> I'm surely ready to accept some slowdown going from Perl to Python, but
> a slowdown factor or 10 is a bit much, in my opinion.
>
> --
> François Pinard   http://www.iro.umontreal.ca/~pinard
>
>
>






More information about the Python-list mailing list