memory leak with re.match
Mayling ge
maylinge0903 at gmail.com
Wed Jul 5 03:04:52 EDT 2017
Thanks. I actually comment out all handling code. The loop ends with the
re_pat.match and nothing followed.
Sent from Mail Master
On 07/05/2017 08:31, [1]Cameron Simpson wrote:
On 04Jul2017 17:01, Mayling ge <maylinge0903 at gmail.com> wrote:
> My function is in the following way to handle file line by line.
There are
> multiple error patterns defined and need to apply to each line.
I use
> multiprocessing.Pool to handle the file in block.
>
> The memory usage increases to 2G for a 1G file. And stays in 2G even
after
> the file processing. File closed in the end.
>
> If I comment out the call to re_pat.match, memory usage is
normal and
> keeps under 100Mb. [...]
>
> def line_match(lines, errors)
> for error in errors:
> try:
> re_pat = re.compile(error['pattern'])
> except Exception:
> print_error
> continue
> for line in lines:
> m = re_pat.match(line)
> # other code to handle matched object
[...]
> Notes: I omit some code as I think the significant
difference is
> with/without re_pat.match(...)
Hmm. Does the handling code (omitted) keep the line or match object in
memory?
If leaving out the "m = re_pat.match(line)" triggers the leak, and
presuming
that line itself doesn't leak, then I would start to suspect the
handling code
is not letting go of the match object "m" or of the line (which is
probably
attached to the match object "m" to support things like m.group() and so
forth).
So you might need to show us the handling code.
Cheers,
Cameron Simpson <cs at zip.com.au>
References
Visible links
1. mailto:cs at zip.com.au
More information about the Python-list
mailing list