[Python-bugs-list] [ python-Bugs-489672 ] memory leak in test_sre

noreply@sourceforge.net noreply@sourceforge.net
Thu, 06 Dec 2001 20:53:53 -0800


Bugs item #489672, was opened at 2001-12-05 18:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=489672&group_id=5470

Category: Regular Expressions
Group: Python 2.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Fredrik Lundh (effbot)
Summary: memory leak in test_sre

Initial Comment:
leak when running test_sre

see attached file for details

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2001-12-06 20:53

Message:
Logged In: YES 
user_id=31435

FYI, the substitution tests were the only ones that leaked 
when run in an infinite loop (I had broken test_re.py into 
a couple dozen distinct test files).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-06 20:26

Message:
Logged In: YES 
user_id=6380

Checked in as _sre.c rev. 2.75.

Fredrik, please review, and close unless you disagree.

Neil, can you run your test again?


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-12-06 20:13

Message:
Logged In: YES 
user_id=31435

Heh -- I have the same fix and already have confidence.  I 
was just about to check it in, but will wait to make sure 
you don't first.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-06 20:09

Message:
Logged In: YES 
user_id=6380

I think I've found this. There's a missing Py_DECREF(filter)
in pattern_subx() in _sre.c. I'll check it in once I've got
more confidence in the fix.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-06 19:44

Message:
Logged In: YES 
user_id=6380

>From email. That was it.  With two backslashes it leaks like
a sieve.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-12-06 19:04

Message:
Logged In: YES 
user_id=31435

Guido, did you take the snippet from email or from the web 
page?  It doesn't "look right" in email (for a change!).  
The 2nd argument to re.sub must have two backslashes; one 
of them vanishes in the email version ... I'll try 
attaching it as a file.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-06 18:56

Message:
Logged In: YES 
user_id=6380

Hm. For me (on Linux) that snippet doesn't leak at all. Are
you sure this is it?

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-12-06 18:35

Message:
Logged In: YES 
user_id=31435

This derived snippet leaks at a prodigious rate:

import re
while 1: re.sub('(a)', '\1', 'a')

Variations don't leak if the capturing group is removed 
from the regexp, or if the replacement text doesn't 
reference the group.

I'm reassigning to /F based on that evidence (it appears to 
have to do with re.sub internals, not with a general Python 
screwup).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-12-06 15:18

Message:
Logged In: YES 
user_id=31435

Hard?  Unpleasant?  Mine <wink>!  Reassigned to me, cuz in 
a moment of weakness I said I would.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-06 14:08

Message:
Logged In: YES 
user_id=6380

It's hard to make progress on this. In a loop it doesn't
leak or leaks too slowly to be useful. The code is not in a
function so it's hard to whittle down effectively.

The first leak reported by Purify is a string literal (or
actially, 4 string literals); this isn't much of a hint
since the code is chock full of those.

The second leak reported is also a string, but one created
through concatenation. There aren't too many of those in
test_sre.py, but still, who knows which one it is...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=489672&group_id=5470