Memory Usage of Strings

Dan Stromberg drsalists at gmail.com
Wed Mar 16 17:51:08 EDT 2011


On Wed, Mar 16, 2011 at 8:38 AM, Amit Dev <amitdev at gmail.com> wrote:

> I'm observing a strange memory usage pattern with strings. Consider
> the following session. Idea is to create a list which holds some
> strings so that cumulative characters in the list is 100MB.
>
> >>> l = []
> >>> for i in xrange(100000):
> ...  l.append(str(i) * (1000/len(str(i))))
>
> This uses around 100MB of memory as expected and 'del l' will clear that.
>
>
> >>> for i in xrange(20000):
> ...  l.append(str(i) * (5000/len(str(i))))
>
> This is using 165MB of memory. I really don't understand where the
> additional memory usage is coming from.
>
> If I reduce the string size, it remains high till it reaches around
> 1000. In that case it is back to 100MB usage.
>
> Python 2.6.4 on FreeBSD.
>
> Regards,
> Amit
> --
> http://mail.python.org/mailman/listinfo/python-list
>

On Python 2.6.6 on Ubuntu 10.10:

$ cat pmu
#!/usr/bin/python

import os
import sys

list_ = []

if sys.argv[1] == '--first':
        for i in xrange(100000):
                list_.append(str(i) * (1000/len(str(i))))
elif sys.argv[1] == '--second':
        for i in xrange(20000):
                list_.append(str(i) * (5000/len(str(i))))
else:
        sys.stderr.write('%s: Illegal sys.argv[1]\n' % sys.argv[0])
        sys.exit(1)

os.system("ps aux | egrep '\<%d\>|^USER\>'" % os.getpid())

dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 -
above cmd done 2011 Wed Mar 16 02:38 PM

$ make
./pmu --first
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1000     11063  0.0  3.4 110212 104436 pts/5   S+   14:38   0:00
/usr/bin/python ./pmu --first
1000     11064  0.0  0.0   1896   512 pts/5    S+   14:38   0:00 sh -c ps
aux | egrep '\<11063\>|^USER\>'
1000     11066  0.0  0.0   4012   740 pts/5    S+   14:38   0:00 egrep
\<11063\>|^USER\>
./pmu --second
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1000     11067 13.0  3.3 107540 101536 pts/5   S+   14:38   0:00
/usr/bin/python ./pmu --second
1000     11068  0.0  0.0   1896   508 pts/5    S+   14:38   0:00 sh -c ps
aux | egrep '\<11067\>|^USER\>'
1000     11070  0.0  0.0   4012   740 pts/5    S+   14:38   0:00 egrep
\<11067\>|^USER\>
dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 -
above cmd done 2011 Wed Mar 16 02:38 PM

So on Python 2.6.6 + Ubuntu 10.10, the second is actually a little smaller
than the first.

Some issues you might ponder:
1) Does FreeBSD's malloc/free know how to free unused memory pages in the
middle of the heap (using mmap games), or does it only sbrk() down when the
end of the heap becomes unused, or does it never sbrk() back down at all?
I've heard various *ix's fall into one of these 3 groups in releasing unused
pages.

2) It mijght be just an issue of how frequently the interpreter garbage
collects; you could try adjusting this; check out the gc module.  Note that
it's often faster not to collect at every conceivable opportunity, but this
tends to add up the bytes pretty quickly in some scripts - for a while,
until the next collection.  So your memory use pattern will often end up
looking like a bit of a sawtooth function.

3) If you need strict memory use guarantees, you might be better off with a
language that's closer to the metal, like C - something that isn't garbage
collected is one parameter to consider.  If you already have something in
CPython, then Cython might help; Cython allows you to use C datastructures
from a dialect of Python.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110316/dbc837b2/attachment.html>


More information about the Python-list mailing list