[Python-Dev] [regex] memory leak

MRAB python at mrabarnett.plus.com
Sun Aug 2 17:54:22 CEST 2009


John Machin wrote:
> Hi Matthew,
> 
> Your post in c.l.py about your re rewrite didn't mention where to report 
> bugs etc so I dug this address out of Google Groups ...
> 
> Environment: Python 2.6.2, Windows XP SP3, your latest (29 July) regex 
> from the Python bugtracker.
> 
> Problem is repeated calls of e.g. compiled_pattern.search(some_text) -- 
> Task Manager performance panel shows increasing memory usage with regex 
> but not with re. It appears to be cumulative i.e. changing to another 
> pattern or text doesn't release memory.
> 
> Example:
> 
> 8<-- regex_timer.py
> import sys
> import time
> if sys.platform == 'win32':
>     timer = time.clock
> else:
>     timer = time.time
> module = __import__(sys.argv[1])
> count = int(sys.argv[2])
> pattern = sys.argv[3]
> expected = sys.argv[4]
> text = 80 * '~' + 'qwerty'
> rx = module.compile(pattern)
> t0 = timer()
> for i in xrange(count):
>     assert rx.search(text).group(0) == expected
> t1 = timer()
> print "%d iterations in %.6f seconds" % (count, t1 - t0)
> 8<---
> 
> Here are the results of running this (plus observed difference between 
> peak memory usage and base memory usage):
> 
> dos-prompt>\python26\python regex_timer.py regex 1000000 "~" "~"
> 1000000 iterations in 3.811500 seconds [60 Mb]
> 
> dos-prompt>\python26\python regex_timer.py regex 2000000 "~" "~"
> 2000000 iterations in 7.581335 seconds [128 Mb]
> 
> dos-prompt>\python26\python regex_timer.py re 2000000 "~" "~"
> 2000000 iterations in 2.549738 seconds [3 Mb]
> 
> This happens on a variety of patterns: "w", "wert", "[a-z]+", "[a-z]+t", 
> ...
> 
Thanks for that, John. I've should've kept an eye on the Task Manager!
:-) Now fixed.

It's surprising how much time and effort is needed just to manage the
memory!


More information about the Python-Dev mailing list