Memory Usage of Strings
Dan Stromberg
drsalists at gmail.com
Wed Mar 16 17:51:08 EDT 2011
On Wed, Mar 16, 2011 at 8:38 AM, Amit Dev <amitdev at gmail.com> wrote:
> I'm observing a strange memory usage pattern with strings. Consider
> the following session. Idea is to create a list which holds some
> strings so that cumulative characters in the list is 100MB.
>
> >>> l = []
> >>> for i in xrange(100000):
> ... l.append(str(i) * (1000/len(str(i))))
>
> This uses around 100MB of memory as expected and 'del l' will clear that.
>
>
> >>> for i in xrange(20000):
> ... l.append(str(i) * (5000/len(str(i))))
>
> This is using 165MB of memory. I really don't understand where the
> additional memory usage is coming from.
>
> If I reduce the string size, it remains high till it reaches around
> 1000. In that case it is back to 100MB usage.
>
> Python 2.6.4 on FreeBSD.
>
> Regards,
> Amit
> --
> http://mail.python.org/mailman/listinfo/python-list
>
On Python 2.6.6 on Ubuntu 10.10:
$ cat pmu
#!/usr/bin/python
import os
import sys
list_ = []
if sys.argv[1] == '--first':
for i in xrange(100000):
list_.append(str(i) * (1000/len(str(i))))
elif sys.argv[1] == '--second':
for i in xrange(20000):
list_.append(str(i) * (5000/len(str(i))))
else:
sys.stderr.write('%s: Illegal sys.argv[1]\n' % sys.argv[0])
sys.exit(1)
os.system("ps aux | egrep '\<%d\>|^USER\>'" % os.getpid())
dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 -
above cmd done 2011 Wed Mar 16 02:38 PM
$ make
./pmu --first
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000 11063 0.0 3.4 110212 104436 pts/5 S+ 14:38 0:00
/usr/bin/python ./pmu --first
1000 11064 0.0 0.0 1896 512 pts/5 S+ 14:38 0:00 sh -c ps
aux | egrep '\<11063\>|^USER\>'
1000 11066 0.0 0.0 4012 740 pts/5 S+ 14:38 0:00 egrep
\<11063\>|^USER\>
./pmu --second
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000 11067 13.0 3.3 107540 101536 pts/5 S+ 14:38 0:00
/usr/bin/python ./pmu --second
1000 11068 0.0 0.0 1896 508 pts/5 S+ 14:38 0:00 sh -c ps
aux | egrep '\<11067\>|^USER\>'
1000 11070 0.0 0.0 4012 740 pts/5 S+ 14:38 0:00 egrep
\<11067\>|^USER\>
dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 -
above cmd done 2011 Wed Mar 16 02:38 PM
So on Python 2.6.6 + Ubuntu 10.10, the second is actually a little smaller
than the first.
Some issues you might ponder:
1) Does FreeBSD's malloc/free know how to free unused memory pages in the
middle of the heap (using mmap games), or does it only sbrk() down when the
end of the heap becomes unused, or does it never sbrk() back down at all?
I've heard various *ix's fall into one of these 3 groups in releasing unused
pages.
2) It mijght be just an issue of how frequently the interpreter garbage
collects; you could try adjusting this; check out the gc module. Note that
it's often faster not to collect at every conceivable opportunity, but this
tends to add up the bytes pretty quickly in some scripts - for a while,
until the next collection. So your memory use pattern will often end up
looking like a bit of a sawtooth function.
3) If you need strict memory use guarantees, you might be better off with a
language that's closer to the metal, like C - something that isn't garbage
collected is one parameter to consider. If you already have something in
CPython, then Cython might help; Cython allows you to use C datastructures
from a dialect of Python.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110316/dbc837b2/attachment-0001.html>
More information about the Python-list
mailing list