[ python-Bugs-876108 ] surprising memory usage with generators

SourceForge.net noreply at sourceforge.net
Tue Jan 13 13:45:56 EST 2004


Bugs item #876108, was opened at 2004-01-13 09:54
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=876108&group_id=5470

Category: Python Interpreter Core
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Kjetil Jacobsen (kjetilja)
Assigned to: Nobody/Anonymous (nobody)
Summary: surprising memory usage with generators

Initial Comment:
When running the attached script on Linux with integer
arguments 14 through 19, then the output is something
like this:

1000 iterations, VmSize: 6432 kB   VmData: 1168 kB
2000 iterations, VmSize: 6692 kB   VmData: 1428 kB
..

I.e. additional ~256 kb memory is allocated after 2k 
iterations.  This is rather weird.

If the script is run with other arguments, say 140 
through 190, then the output is something like this
(which is what one should expect):

1000 iterations, VmSize: 6432 kB   VmData: 1168 kB
2000 iterations, VmSize: 6432 kB   VmData: 1168 kB
3000 iterations, VmSize: 6432 kB   VmData: 1168 kB

I.e. no memory increase.  The reason why this is
somewhat disturbing is that we used generators in the
same way as this in a larger project and found the
interpreter to be continously leaking.

To stop the memory leak problem we used an alternate
scheme (the one in the doit_noleak) where the generator
is called only in a list comprehension.

Running with a --with-pydebug enabled interpreter
reveals that the additional memory is due to an extra
pymalloc arena.  Note that the behaviour is the same
for python 2.2.X without pymalloc enabled so this does
not seem to be a pymalloc specific issue.


----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2004-01-13 13:45

Message:
Logged In: YES 
user_id=80475

I suspect the issue is the expectation that ''.join(x) will
consume O(1) memory instead of O(n) when x is a generator. 
However, the way join works is by turning x into a tuple
when it is not already a list or tuple.  This is necessary
because join makes two passes over the data and some
iterables are not restartable.

To confirm that the issue is not specific to generators, try
substituting non-generator code that is equivalent to myiter:

myiter = lambda l: itertools.imap(lambda (p,m):p, l)

or

class myiter:
    def __init__(self, l):
        self.l = l
    def __iter__(self):
        self.index = 0
        return self
    def next(self):
        p, m = self.l(self.index)
        self.index += 1
        return p

If the issue persists with these substitutions, then it is
not generator related.

To further improve the test script, change the last lines to:
    mylist = [('', {})]*int(sys.argv[1])
    test(doit_leak, mylist)

This has the effect of making sure the input list isn't
freed during the test.  Whenever join has to make its own
tuple, the original data cannot be freed until after the
join.  In contrast, the list comprehension version discards
the original list as soon as the new one is built.

As a work-around, whenever providing input to join, first
convert the argument to a list using something like,
map(lambda t:t[0], l).  That ought to be fastest and consume
the least memory whenever there are no other references to
the input list.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=876108&group_id=5470



More information about the Python-bugs-list mailing list