[New-bugs-announce] [issue42885] Regex performance problem with ^ aka AT_BEGINNING
Arnim Rupp
report at bugs.python.org
Sun Jan 10 16:58:21 EST 2021
New submission from Arnim Rupp <erich at rupp.de>:
The re lib needs 7 seconds to check if a billion As start with an x. So e.g. this statement takes this long:
re.search(r'^x', 'A' * 1000000000)
It takes longer, the longer the string is. The string handling is not the problem, checking if it starts which an A takes just 0.00014 seconds. See output and code below:
3.10.0a4+ (heads/master:d16f617, Jan 9 2021, 13:24:45)
[GCC 7.5.0]
testing string len: 100000
re_test_false: 0.0008246829966083169
testing string len: 1000000000
re_test_false: 7.317708015005337
testing string len: 1000000000
re_test_true: 0.00014710200048284605
import re, timeit, functools, sys
def re_test_true(string):
print("testing string len: ", len(string))
re.search(r'^A', string)
def re_test_false(string):
print("testing string len: ", len(string))
re.search(r'^x', string)
print(sys.version)
huge_string = 'A' * 100000
print('re_test_false: ', timeit.timeit(functools.partial(re_test_false, huge_string), number=1))
huge_string = 'A' * 1000000000
print('re_test_false: ', timeit.timeit(functools.partial(re_test_false, huge_string), number=1))
print('re_test_true: ', timeit.timeit(functools.partial(re_test_true, huge_string), number=1))
----------
components: Library (Lib)
files: regex_timeit.py
messages: 384782
nosy: another_try
priority: normal
severity: normal
status: open
title: Regex performance problem with ^ aka AT_BEGINNING
type: performance
versions: Python 3.10
Added file: https://bugs.python.org/file49733/regex_timeit.py
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42885>
_______________________________________
More information about the New-bugs-announce
mailing list