[pypy-issue] Issue #2777: re: incorrect behaviour for long patterns that are used repeatedly (possible JIT bug?) (pypy/pypy)

Andrew Stepanov issues-reply at bitbucket.org
Sat Mar 24 18:27:37 EDT 2018


New issue 2777: re: incorrect behaviour for long patterns that are used repeatedly (possible JIT bug?)
https://bitbucket.org/pypy/pypy/issues/2777/re-incorrect-behaviour-for-long-patterns

Andrew Stepanov:

I've observed that `re` module gives incorrect results for very long patterns that are used repeatedly (possible JIT bug?)

The following code produces an error on both pypy2 & pypy3 (latest version from mercurial) although it passes on CPython3.5
```
import re
pattern = ".a" * 2500
text = "a" * 6000
match = re.compile(pattern).match
for idx in range(len(text) - len(pattern) + 1):
    substr = text[idx:idx+len(pattern)]
    if match(substr) is None:
        raise RuntimeError("This shouldn't have happened at {}".format(idx))
```

```
Traceback (most recent call last):
  File "pypy_re_bug.py", line 9, in <module>
    raise RuntimeError("This shouldn't have happened at {}".format(idx))
RuntimeError: This shouldn't have happened at 632
```
This also happens for other long patterns (I tried `pattern = "." * 5000`, `pattern = "a" * 5000` and random strings from `{".", "a"}` alphabet of lengths >= 5000)

The exact number of iterations before the error occurs can vary slightly, e.g. if I move `match = re.compile(pattern).match` inside the loop, I get exception at iteration 668 on pypy3 and 643 on pypy2.

The code works fine for shorter patterns.




More information about the pypy-issue mailing list