[Python-Dev] re performance
Serhiy Storchaka
storchaka at gmail.com
Sun Jan 29 16:08:20 EST 2017
On 29.01.17 12:18, Jakub Wilk wrote:
> * Armin Rigo <armin.rigoatgmail.com>, 2017-01-28, 12:44:
>> The theoretical kind of regexp is about giving a "yes/no" answer,
>> whereas the concrete "re" or "regexp" modules gives a match object,
>> which lets you ask for the subgroups' location, for example. Strange
>> at it may seem, I am not aware of a way to do that using the
>> linear-time approach of the theory---if it answers "yes", then you
>> have no way of knowing *where* the subgroups matched.
>>
>> Another issue is that the theoretical engine has no notion of
>> greedy/non-greedy matching.
>
> RE2 has linear execution time, and it supports both capture groups and
> greedy/non-greedy matching.
>
> The implementation is explained in this article:
> https://swtch.com/~rsc/regexp/regexp3.html
Not all features of Python regular expressions can be implemented with
linear complexity. It is possible to compile the part of regular
expressions to the implementation with linear complexity. Patches are
welcome.
More information about the Python-Dev
mailing list