Regex Speed

garrickp at gmail.com garrickp at gmail.com
Tue Feb 20 16:29:41 EST 2007


While creating a log parser for fairly large logs, we have run into an
issue where the time to process was relatively unacceptable (upwards
of 5 minutes for 1-2 million lines of logs). In contrast, using the
Linux tool grep would complete the same search in a matter of seconds.

The search we used was a regex of 6 elements "or"ed together, with an
exclusionary set of ~3 elements. Due to the size of the files, we
decided to run these line by line, and due to the need of regex
expressions, we could not use more traditional string find methods.

We did pre-compile the regular expressions, and attempted tricks such
as map to remove as much overhead as possible.

With the known limitations of not being able to slurp the entire log
file into memory, and the need to use regular expressions, do you have
an ideas on how we might speed this up without resorting to system
calls (our current "solution")?




More information about the Python-list mailing list