[pypy-issue] Issue #2052: Regex module memory leak/crash (pypy/pypy)

magxx issues-reply at bitbucket.org
Tue May 26 03:59:45 CEST 2015

New issue 2052: Regex module memory leak/crash


While using the [regex](https://pypi.python.org/pypi/regex/2015.05.10) module when parsing a lot of data (I have ~1GB of lines) I've notice that at a certain point the code balloons in memory usage to about 200MB/s. I highly suspect it has something to do with regex.finditer() as if I have that line of code the memory leak will occur. When parsing a small 30MB file this issue did not get a chance to arise.

The code looks like:

int_group_test = special_end_regex.search(string)
if int_group_test:
    new_string = string[:int_group_test.start()]
last_splitter = [m.end() for m in dash_regex.finditer(new_string)]

I have tried both 32 and 64 bit versions of PyPy with the same result. If the re module is used instead then this problem does not occur. I also used CPython with regex and re and both succeed without any such memory leaks.

More information about the pypy-issue mailing list