![](https://secure.gravatar.com/avatar/7c721b6de34c82ce39324dae5214dbf8.jpg?s=120&d=mm&r=g)
[J.B. Langston <jblangston@datastax.com>]
And unfortunately it does appear that my app took an almost a 20% performance hit from using regex instead of re, unfortunately. Processing time for a test dataset with 700MB of logs went from 77 seconds with the standard library re to 92 seconds with regex. Profiling confirms that the time spent in the groupdict method went from 3.27% to 9.41% and the time spent in match went from 5.19 to 10.33% of the execution time.
I'm mildly surprised! Most times people report that regex is at least modestly faster than re. For peak performance at the expense of flexibility (e.g., no backreferences supported), perhaps you'd get a major speed gain from an entirely different regexp engine approach. Google built such an engine, which has worst-case linear matching runtime (but, depending on details, _may_ require space exponential in the regexp to compile a regexp object). A Python binding for that is here ("pip install pyre2"): https://pypi.org/project/pyre2/ I haven't used it - this isn't a particular area of interest for me. Note that in its "Performance" section, it shows examples where it blows re out of the water. But, in all those cases, regex was modestly to very significantly faster than re too. YMMV.
So this means switching to regex is probably a no go.
The difference between 77 and 92 seconds doesn't, on the face of it, scream "disaster" to me - but suit yourself.
If hanging regexes become a common occurrence for my app I might decide it's worth the performance hit in the name of safety, but at this point I would rather not.
If they were destined to become a common occurrence, they already would have done so. You blamed your bad case on unexpected data, but the _actual_ cause was an unintended typo in one of your regexps. So it goes. You're using a tool with a hyper-concise notation, where a correct expression is pretty much indistinguishable from line noise, and a typo is rarely detectable as a syntax error. So you're learning the hard way that you have to be on crisis-level alert when writing regexps: they're extremely touchy and unforgiving, pyre2 would spare you from all match-time timing disasters, but "touchy and unforgiving" applies all the same. Instead of a typo causing exponential runtime, it may instead cause the regexp to match (or fail to match) in unintended ways. About which no clue of any kind will be left behind, unless you stare at the inputs and outputs and check them yourself. But, in that case, 92 seconds wouldn't even get you through 1000 bytes ;-)