Re: [Python-Dev] re performance

29 Jan 2017


      * Armin Rigo , 2017-01-28, 12:44:
...
The theoretical kind of regexp is about giving a "yes/no" answer, whereas the 
concrete "re" or "regexp" modules gives a match object, which lets you ask for 
the subgroups' location, for example. Strange at it may seem, I am not aware 
of a way to do that using the linear-time approach of the theory---if it 
answers "yes", then you have no way of knowing *where* the subgroups matched.
Another issue is that the theoretical engine has no notion of 
greedy/non-greedy matching.
RE2 has linear execution time, and it supports both capture groups and 
greedy/non-greedy matching.

The implementation is explained in this article:
https://swtch.com/~rsc/regexp/regexp3.html

-- 
Jakub Wilk