Speeding up application startup

Fahd Khan fahdkhan at bayou.uh.edu
Mon May 14 21:53:50 EDT 2001


> Fahd Khan wrote:
> > This is a guess, and I haven't bothered to look up if it is even
possible,
> > but perhaps you could pickle the compiled regex objects into a cache
file of
> > some sort?
>
> you can pickle regexps, but that stores the original textual
representation,
> and recompiles them on the way in (the internal data format is not
designed
> to be portable).
>
> possible workarounds include:
>
> - if you're using 1.6 or newer, you can use "from pre import re" instead
> of "re" -- the new engine is usually faster when it comes to running the
> compiled expression, but pre's compiler is written in C, not Python.
>
> - rethink your design.  do you really need 15000 regular expressions?  do
> you really need to compile all of them on the way in?  do you really use
> all of them during a typical program run?
>
> - try storing them as text strings instead, and use re.match(p, s) instead
> of p.match(s).  you can bump the SRE cache like this:
>
>     import sre
>     sre._MAXCACHE = 15000
>
> adding a "print len(sre._cache)" to the end of your program will give you
> an idea of how many expressions you're using during a typical run.
>

I tried that in a couple of my own re-heavy scripts and it helped a tiny
bit. =) Whether or not he can mess with his design is a semantic thing, and
I suppose we don't have enough context to speculate. However based on what
he did say let me guess a little.

A 15s start up time is only bad if the script runs often. If it runs often,
you'd want to compile the regex's once and keep them in some sort of server
process. Since they are not serializable you can't ship the regex's over to
the client, so instead we'd be forced to move the text over to the server.
Overhead, even on a single machine. I'm guessing that if you are running a
web server, say mod_python on Apache you could create something akin to a
Java servlet to prevent starting the script anew each time around... If
you're using ASP I suspect you could do some ActiveX/COM executable server
magic. If this were C I would probably do some kind of mess with a shared
memory segment...

Anyways it's an interesting problem.

Fahd





More information about the Python-list mailing list