[Python-Dev] Where to report performance of re? [was: (no subject)]
Stephen J. Turnbull
turnbull.stephen.fw at u.tsukuba.ac.jp
Fri Dec 29 05:25:05 EST 2017
Franklin? Lee writes:
> On Tue, Dec 26, 2017 at 2:01 AM, Yogev Hendel <yogev at intsights.com> wrote:
> > I don't know if this is the right place to put this,
> > but I've found the following lines of code results in an incredibly long
> > processing time.
> (I think the correct place is python-list. python-dev is primarily for
> the developers of Python itself.
I read this as a report of a performance problem, perhaps a
regression. IMO YMMV, Python-list would be appropriate only if the
reporter was already aware that some regular expressions are expected
to be disastrously non-performant on some target strings, and wanted
help dealing with that. That doesn't seem to be the case (and your
response also presumes ignorance of the performance properties of re).
> Bug reports and feature requests belong in
> https://bugs.python.org/ (where your post could also have gone).)
If you really meant that, you should have avoided rewarding him with a
wordy explanation. ;-) N.B. I've bookmarked yours, it's excellent!
Thank you very much!
I think even in this case an experienced community member probably
should report on the tracker, though. If all they got was, "WONTFIX:
that's how the algorithm works" there, *then* they go to python-list.
> (assuming P != NP): Perl allows (introduced?) backreferences, which
I've been using them in Emacsen since 1987 or so.
> are NP-hard. Perl also added some other features which complicate
> things, but backreferences are enough.
> The user-level solution is to understand how regexes are executed, and
> to work around it.
This is the Pythonic approach<0.5 wink/>, in the sense that we haven't
gone to the trouble of trying to improve the algorithm yet.
> 2. In theory, we can use the textbook algorithm when possible, and the
> backtracking algorithm when necessary. However, the textbook version
> won't necessarily be faster, and may take more time to create, so
> there's a tradeoff here.
There may be a question of the difficulty of maintenance, as well.
The linear time algorithms are somewhat more delicate, and we're still
occasionally discussing what semantics a regular expression should
have (null matches), let alone whether they're implemented correctly.
More information about the Python-Dev