Memory Usage of Strings

Amit Dev amitdev at gmail.com
Thu Mar 17 02:11:58 EDT 2011


Thanks Dan for the detailed reply. I suspect it is related to FreeBSD
malloc/free as you suggested. Here is the output of running your
script:

[16-bsd01 ~/work]$ python strm.py --first
USER    PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
amdev  6899  3.0  6.9 111944 107560  p0  S+    9:57PM   0:01.20 python
strm.py --first (python2.5)
amdev  6900  0.0  0.1  3508  1424  p0  S+    9:57PM   0:00.02 sh -c ps
aux | egrep '\\<6899\\>|^USER\\>'
amdev  6902  0.0  0.1  3380  1188  p0  S+    9:57PM   0:00.01 egrep
\\<6899\\>|^USER\\>

[16-bsd01 ~/work]$ python strm.py --second
USER    PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
amdev  6903  0.0 10.5 166216 163992  p0  S+    9:57PM   0:00.92 python
strm.py --second (python2.5)
amdev  6904  0.0  0.1  3508  1424  p0  S+    9:57PM   0:00.02 sh -c ps
aux | egrep '\\<6903\\>|^USER\\>'
amdev  6906  0.0  0.1  3508  1424  p0  R+    9:57PM   0:00.00 egrep
\\<6903\\>|^USER\\> (sh)

Regards,
Amit

On Thu, Mar 17, 2011 at 3:21 AM, Dan Stromberg <drsalists at gmail.com> wrote:
>
> On Wed, Mar 16, 2011 at 8:38 AM, Amit Dev <amitdev at gmail.com> wrote:
>>
>> I'm observing a strange memory usage pattern with strings. Consider
>> the following session. Idea is to create a list which holds some
>> strings so that cumulative characters in the list is 100MB.
>>
>> >>> l = []
>> >>> for i in xrange(100000):
>> ...  l.append(str(i) * (1000/len(str(i))))
>>
>> This uses around 100MB of memory as expected and 'del l' will clear that.
>>
>>
>> >>> for i in xrange(20000):
>> ...  l.append(str(i) * (5000/len(str(i))))
>>
>> This is using 165MB of memory. I really don't understand where the
>> additional memory usage is coming from.
>>
>> If I reduce the string size, it remains high till it reaches around
>> 1000. In that case it is back to 100MB usage.
>>
>> Python 2.6.4 on FreeBSD.
>>
>> Regards,
>> Amit
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>
> On Python 2.6.6 on Ubuntu 10.10:
>
> $ cat pmu
> #!/usr/bin/python
>
> import os
> import sys
>
> list_ = []
>
> if sys.argv[1] == '--first':
>         for i in xrange(100000):
>                 list_.append(str(i) * (1000/len(str(i))))
> elif sys.argv[1] == '--second':
>         for i in xrange(20000):
>                 list_.append(str(i) * (5000/len(str(i))))
> else:
>         sys.stderr.write('%s: Illegal sys.argv[1]\n' % sys.argv[0])
>         sys.exit(1)
>
> os.system("ps aux | egrep '\<%d\>|^USER\>'" % os.getpid())
>
> dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 -
> above cmd done 2011 Wed Mar 16 02:38 PM
>
> $ make
> ./pmu --first
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> 1000     11063  0.0  3.4 110212 104436 pts/5   S+   14:38   0:00
> /usr/bin/python ./pmu --first
> 1000     11064  0.0  0.0   1896   512 pts/5    S+   14:38   0:00 sh -c ps
> aux | egrep '\<11063\>|^USER\>'
> 1000     11066  0.0  0.0   4012   740 pts/5    S+   14:38   0:00 egrep
> \<11063\>|^USER\>
> ./pmu --second
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> 1000     11067 13.0  3.3 107540 101536 pts/5   S+   14:38   0:00
> /usr/bin/python ./pmu --second
> 1000     11068  0.0  0.0   1896   508 pts/5    S+   14:38   0:00 sh -c ps
> aux | egrep '\<11067\>|^USER\>'
> 1000     11070  0.0  0.0   4012   740 pts/5    S+   14:38   0:00 egrep
> \<11067\>|^USER\>
> dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 -
> above cmd done 2011 Wed Mar 16 02:38 PM
>
> So on Python 2.6.6 + Ubuntu 10.10, the second is actually a little smaller
> than the first.
>
> Some issues you might ponder:
> 1) Does FreeBSD's malloc/free know how to free unused memory pages in the
> middle of the heap (using mmap games), or does it only sbrk() down when the
> end of the heap becomes unused, or does it never sbrk() back down at all?
> I've heard various *ix's fall into one of these 3 groups in releasing unused
> pages.
>
> 2) It mijght be just an issue of how frequently the interpreter garbage
> collects; you could try adjusting this; check out the gc module.  Note that
> it's often faster not to collect at every conceivable opportunity, but this
> tends to add up the bytes pretty quickly in some scripts - for a while,
> until the next collection.  So your memory use pattern will often end up
> looking like a bit of a sawtooth function.
>
> 3) If you need strict memory use guarantees, you might be better off with a
> language that's closer to the metal, like C - something that isn't garbage
> collected is one parameter to consider.  If you already have something in
> CPython, then Cython might help; Cython allows you to use C datastructures
> from a dialect of Python.
>
>
>



More information about the Python-list mailing list