memory leak with re.match
Mayling ge
maylinge0903 at gmail.com
Tue Jul 4 05:01:08 EDT 2017
Hi,
My function is in the following way to handle file line by line. There are
multiple error patterns defined and need to apply to each line. I use
multiprocessing.Pool to handle the file in block.
The memory usage increases to 2G for a 1G file. And stays in 2G even after
the file processing. File closed in the end.
If I comment out the call to re_pat.match, memory usage is normal and
keeps under 100Mb.
am I using re in a wrong way? I cannot figure out a way to fix the memory
leak. And I googled .
def line_match(lines, errors)
for error in errors:
try:
re_pat = re.compile(error['pattern'])
except Exception:
print_error
continue
for line in lines:
m = re_pat.match(line)
# other code to handle matched object
def process_large_file(fo):
p = multiprocessing.Pool()
while True:
lines = list(itertools.islice(fo, line_per_proc))
if not lines:
break
result = p.apply_async(line_match, args=(errors, lines))
Notes: I omit some code as I think the significant difference is
with/without re_pat.match(...)
Regards,
-Meiling
More information about the Python-list
mailing list