Memory Usage of Strings
Amit Dev
amitdev at gmail.com
Thu Mar 17 02:11:58 EDT 2011
Thanks Dan for the detailed reply. I suspect it is related to FreeBSD
malloc/free as you suggested. Here is the output of running your
script:
[16-bsd01 ~/work]$ python strm.py --first
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
amdev 6899 3.0 6.9 111944 107560 p0 S+ 9:57PM 0:01.20 python
strm.py --first (python2.5)
amdev 6900 0.0 0.1 3508 1424 p0 S+ 9:57PM 0:00.02 sh -c ps
aux | egrep '\\<6899\\>|^USER\\>'
amdev 6902 0.0 0.1 3380 1188 p0 S+ 9:57PM 0:00.01 egrep
\\<6899\\>|^USER\\>
[16-bsd01 ~/work]$ python strm.py --second
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
amdev 6903 0.0 10.5 166216 163992 p0 S+ 9:57PM 0:00.92 python
strm.py --second (python2.5)
amdev 6904 0.0 0.1 3508 1424 p0 S+ 9:57PM 0:00.02 sh -c ps
aux | egrep '\\<6903\\>|^USER\\>'
amdev 6906 0.0 0.1 3508 1424 p0 R+ 9:57PM 0:00.00 egrep
\\<6903\\>|^USER\\> (sh)
Regards,
Amit
On Thu, Mar 17, 2011 at 3:21 AM, Dan Stromberg <drsalists at gmail.com> wrote:
>
> On Wed, Mar 16, 2011 at 8:38 AM, Amit Dev <amitdev at gmail.com> wrote:
>>
>> I'm observing a strange memory usage pattern with strings. Consider
>> the following session. Idea is to create a list which holds some
>> strings so that cumulative characters in the list is 100MB.
>>
>> >>> l = []
>> >>> for i in xrange(100000):
>> ... l.append(str(i) * (1000/len(str(i))))
>>
>> This uses around 100MB of memory as expected and 'del l' will clear that.
>>
>>
>> >>> for i in xrange(20000):
>> ... l.append(str(i) * (5000/len(str(i))))
>>
>> This is using 165MB of memory. I really don't understand where the
>> additional memory usage is coming from.
>>
>> If I reduce the string size, it remains high till it reaches around
>> 1000. In that case it is back to 100MB usage.
>>
>> Python 2.6.4 on FreeBSD.
>>
>> Regards,
>> Amit
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>
> On Python 2.6.6 on Ubuntu 10.10:
>
> $ cat pmu
> #!/usr/bin/python
>
> import os
> import sys
>
> list_ = []
>
> if sys.argv[1] == '--first':
> for i in xrange(100000):
> list_.append(str(i) * (1000/len(str(i))))
> elif sys.argv[1] == '--second':
> for i in xrange(20000):
> list_.append(str(i) * (5000/len(str(i))))
> else:
> sys.stderr.write('%s: Illegal sys.argv[1]\n' % sys.argv[0])
> sys.exit(1)
>
> os.system("ps aux | egrep '\<%d\>|^USER\>'" % os.getpid())
>
> dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 -
> above cmd done 2011 Wed Mar 16 02:38 PM
>
> $ make
> ./pmu --first
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> 1000 11063 0.0 3.4 110212 104436 pts/5 S+ 14:38 0:00
> /usr/bin/python ./pmu --first
> 1000 11064 0.0 0.0 1896 512 pts/5 S+ 14:38 0:00 sh -c ps
> aux | egrep '\<11063\>|^USER\>'
> 1000 11066 0.0 0.0 4012 740 pts/5 S+ 14:38 0:00 egrep
> \<11063\>|^USER\>
> ./pmu --second
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> 1000 11067 13.0 3.3 107540 101536 pts/5 S+ 14:38 0:00
> /usr/bin/python ./pmu --second
> 1000 11068 0.0 0.0 1896 508 pts/5 S+ 14:38 0:00 sh -c ps
> aux | egrep '\<11067\>|^USER\>'
> 1000 11070 0.0 0.0 4012 740 pts/5 S+ 14:38 0:00 egrep
> \<11067\>|^USER\>
> dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 -
> above cmd done 2011 Wed Mar 16 02:38 PM
>
> So on Python 2.6.6 + Ubuntu 10.10, the second is actually a little smaller
> than the first.
>
> Some issues you might ponder:
> 1) Does FreeBSD's malloc/free know how to free unused memory pages in the
> middle of the heap (using mmap games), or does it only sbrk() down when the
> end of the heap becomes unused, or does it never sbrk() back down at all?
> I've heard various *ix's fall into one of these 3 groups in releasing unused
> pages.
>
> 2) It mijght be just an issue of how frequently the interpreter garbage
> collects; you could try adjusting this; check out the gc module. Note that
> it's often faster not to collect at every conceivable opportunity, but this
> tends to add up the bytes pretty quickly in some scripts - for a while,
> until the next collection. So your memory use pattern will often end up
> looking like a bit of a sawtooth function.
>
> 3) If you need strict memory use guarantees, you might be better off with a
> language that's closer to the metal, like C - something that isn't garbage
> collected is one parameter to consider. If you already have something in
> CPython, then Cython might help; Cython allows you to use C datastructures
> from a dialect of Python.
>
>
>
More information about the Python-list
mailing list