
On 2022-02-16 22:13, Tim Peters wrote:
[J.B. Langston <jblangston@datastax.com>]
Well, I certainly sparked a lot of interesting discussion, which I have quite enjoyed reading. But to bring this thread back around to its original topic, is there support among the Python maintainers for adding a timeout feature to the Python re library?
Buried in the fun discussion was my guess: no way. Python's re is effectively dead legacy code, with no current "owner". Its commit history shows very little activity for some years already. Mos\ commits are due to generic "cod\e cleanup" crusades that have nothing specific to do with the algorithms. None required non-triv\ial knowledge of the implementation.
Here's the most recent I found that actually changed behavior:
""" commit 6cc8ac949907b9a1c0f73709c6978b7a43e634e3 Author: Zackery Spytz <zspytz@gmail.com> Date: Fri May 21 14:02:42 2021 -0700
bpo-40736: Improve the error message for re.search() TypeError (GH-23312)
Include the invalid type in the error message. """"
A trivial change.
I will look at the third-party regex library that Jonathan suggested but I still believe a timeout option would be a valuable feature to have in the standard library.
Which is the problem: regex has _dozens_ of features that would be valuable to have in the standard library. reg\ex is in fact one of the best regexp libraries on the planet. It already has timeouts, and other features (like possessive quantifiers) that are actually (unlike timeouts) frequently asked about by many programmers.
In fact regex started life intending to go into core Python, in 2008:
https://bugs.python.org/issue3825
That stretched on and on, and the endless bikeshedding eventually appeared to fizzle out in 2014:
https://bugs.python.org/issue2636
In 2021 a core dev eventually rejected it, as by then MRAB had long since released it as a successful extension module. I assume - but don't know - he got burned out by "the endless bikeshedding" on those issue reports.
I eventually decided against having it added to the standard library because that would tie fixes and additions to Python's release cycle, and there's that adage that Python has "batteries included", but not nuclear reactors. PyPI is a better place for it, for those who need more than what the standard re module provides.
In any cose, no, no core dev I know of is going to devote their limited time to reproducing a tiny subset of regex's many improvements in Python's legacy engine. In fact, "install regex!" is such an obvious choice at this point that I wouldn't even give time to just reviewing a patch that added timeouts.
BTW, I didn't mention regex in your BPO report because I didn't know at the time it already implemented timeouts. I learned that in this thread.