[Python-Dev] re performance

MRAB python at mrabarnett.plus.com
Thu Feb 2 13:39:11 EST 2017


On 2017-02-02 04:37, Franklin? Lee wrote:
> On Thu, Jan 26, 2017 at 4:13 PM, Sven R. Kunze <srkunze at mail.de> wrote:
>> Hi folks,
>>
>> I recently refreshed regular expressions theoretical basics *indulging in
>> reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html
>>
>> However, reaching the chart in the lower third of the article, I saw Python
>> 2.4 measured against a naive Thompson matching implementation. And I was
>> surprised about how bad it performed compared to an unoptimized version of
>> an older than dirt algorithm.
>
>> From my perspective, I can say, that regular expressions might worth
>> optimizing especially for web applications (url matching usually uses
>> regexes) but also for other applications where I've seen many tight loops
>> using regexes as well. So, I am probing interest on this topic here.
>
> What I (think I) know:
> - Both re and regex use the same C backend, which is not based on NFA.
> - The re2 library, which the writer of that article made, allows
> capture groups (but only up to a limit) and bounded repetitions (up to
> a limit).
> - Perl has started to optimize some regex patterns.
>
[snip]

re and regex use different C backends. Both are based on NFA.

re2 is based on DFA, with a fallback to re.



More information about the Python-Dev mailing list