Python and regexp efficiency.. again.. :)

Sat Dec 11 06:12:01 EST 1999

Patrick Tufts <zippy at cs.brandeis.edu> writes:
> In article <al8n1rj6tgv.fsf at sirppi.helsinki.fi>, Markus Stenberg
> <mstenber at cc.Helsinki.FI> wrote:
> > One order of magnitude optimization gain was received by writing
> > a specialized regexp optimization tool - as the regexps are mostly
> > of type
> >                 ^commonstringuniquetail
> >                 ^commonstringanothertail
> Depending on how many different extensions there are to commonstring,
> you might do better with the regexp:

Regrettably, there's N different extensions.

>    ^commonstring(.*)
> 
> and then matching the saved pattern (.*) against a dictionary of
> possible extensions.

Basically, the common start is usually date, and the non-common parts are,
depending on log type, for example service name and message string
(syslog). Generally, the service+message combination is the interesting
part, but to prevent false matches, their location on the line must be
verified to be just after the date in the beginning on the line.

Admittedly, I _think_ it might be somewhat faster (but not much) to do
date-part-checking in C and then just use regexps to parse the tail, but I
doubt I could gain order of magnitude in speed from that.

> --Pat

-Markus

-- 
"He who fights with monsters should look to it that he himself does
not become a monster. And when you gaze long into an abyss the abyss
also gazes into you."
                    - Friedrich Nietzsche, _Beyond Good and Evil_