[pypy-issue] [issue1347] Perf problem with many regular expressions

mmaenpaa tracker at bugs.pypy.org
Sun Dec 1 23:27:24 CET 2013


mmaenpaa <mika.j.maenpaa at iki.fi> added the comment:

It seems that Pypy's jit has real problems when dealing with large number of 
regular expressions that are used often. CPython and Pypy without jit 
don't have problems and they seem to scale linearly when increasing number of 
regular expressions.

Steps to test slowdown when creating and using many regular expressions at the
same time:

1. Create and compile n number of random regular expressions
2. Create predetermined number of random strings
3. Measure time it takes to test all generated strings against all 
   regular expressions with re.match

When testing with 20000 random strings, Pypy starts to slowdown when dealing 
with more than 10 regular expressions. When dealing with more than 25 regular 
expressions, Pypy is actually faster with jit disabled.

Attached test program was run with Pypy 2.2.1 and CPython 2.7.3 on 64-bit 
Debian Wheezy. Jit was disabled with "--jit off" command line parameter.

$ python run-regexp-bug.py 20000
random strings: 20000
n    pypy(s)    cpython(s) pypy-nojit(s)
1    0.006      0.010      0.020     
5    0.029      0.039      0.065     
10   0.054      0.078      0.125     
25   0.403      0.197      0.298     
50   0.789      0.391      0.616     
75   1.405      0.576      0.944     
100  2.049      0.763      1.193     
200  7.454      1.606      2.459     
300  18.311     2.313      3.816     
400  30.136     3.079      4.944     
500  44.634     3.849      6.282     
600  63.083     4.871      8.198     
700  79.475     5.488      8.837     
800  98.820     6.152      10.070    
900  122.998    7.306      11.248    
1000 143.714    8.037      12.766    
1100 171.839    8.479      13.758    
1200 200.731    9.274      15.087

----------
nosy: +mmaenpaa
status: unread -> chatting

________________________________________
PyPy bug tracker <tracker at bugs.pypy.org>
<https://bugs.pypy.org/issue1347>
________________________________________


More information about the pypy-issue mailing list