[pypy-issue] Issue #2052: Regex module memory leak/crash (pypy/pypy)
issues-reply at bitbucket.org
Tue May 26 03:59:45 CEST 2015
New issue 2052: Regex module memory leak/crash
While using the [regex](https://pypi.python.org/pypi/regex/2015.05.10) module when parsing a lot of data (I have ~1GB of lines) I've notice that at a certain point the code balloons in memory usage to about 200MB/s. I highly suspect it has something to do with regex.finditer() as if I have that line of code the memory leak will occur. When parsing a small 30MB file this issue did not get a chance to arise.
The code looks like:
int_group_test = special_end_regex.search(string)
new_string = string[:int_group_test.start()]
last_splitter = [m.end() for m in dash_regex.finditer(new_string)]
I have tried both 32 and 64 bit versions of PyPy with the same result. If the re module is used instead then this problem does not occur. I also used CPython with regex and re and both succeed without any such memory leaks.
More information about the pypy-issue