Regarding Regex timeout behavior to minimize CPU consumption
Peter J. Holzer
hjp-python at hjp.at
Sat Dec 5 18:34:13 EST 2020
On 2020-12-05 23:42:11 +0100, sjeik_appie at hotmail.com wrote:
> Timeout: no idea. But check out re.compile and re.iterfind as they might
> speed things up.
I doubt that compiling regular expressions helps the OP much. Compiled
regular expressions are cached, but more importantly, if a match takes
long enough that specifying a timeout is useful, the time is almost
certainly not spent compiling, but matching - most likely backtracking
from lots of promising but ultimately unsuccessful partial matches.
> regex = r'data-stid="section-room-list"[\s\S]*?>\s*([\s\S]*?)\s*' \
>
> r'(?:class\s*=\s*"\s*sticky-book-now\s*"|</ul>\s*</section>|id\s*=\s*"Location")'
> rooms_blocks_to_be_replace = re.findall(regex, html_template)
This part:
\s*([\s\S]*?)\s*'
looks dangerous from a performance point of view. If that can be
rewritten with less potential for backtracking, it might help.
Generally, it should be possible to implement a timeout for any
operation by either scheduling an alarm with signal.alarm or by
executing the operation in a separate thread and killing the thread if
it takes too long.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp at hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/python-list/attachments/20201206/e0e65983/attachment.sig>
More information about the Python-list
mailing list