memory usage, temporary and otherwise

Wed Mar 3 16:46:08 EST 2010

mk a écrit :
> 
> Obviously, don't try this on low-memory machine:
> 
>>>> a={}
>>>> for i in range(10000000):

Note that in Python 2, this will build a list of 10000000 int objects.
You may want to use xrange instead...

> ...     a[i]='spam'*10
> ...
>>>> import sys
>>>> sys.getsizeof(a)
> 201326728
>>>> id(a[1])
> 3085643936L
>>>> id(a[100])
> 3085713568L
> 
>>>> ids={}
>>>> for i in range(len(a)):

And this build yet another list of 10000000 int objects.

> ...     ids[id(a[i])]=True
> ...
>>>> len(ids.keys())
> 10000000
> 
> Hm, apparently Python didn't spot that 'spam'*10 in a's values is really
> the same string, right?

Seems not. FWIW, Python does some caching on some values of some
immutable types (small ints, some strings etc), but this is
implementation dependant so you shouldn't rely on it.

> So sys.getsizeof returns some 200MB for this dictionary. But according
> to top RSS of the python process is 300MB. ps auxw says the same thing
> (more or less).
>
> Why the 50% overhead? (and I would swear that a couple of times RSS
> according to top grew to 800MB).

(overly simplified)

When an object is garbage-collected, the memory is not necessarily
"returned" to the system - and the system doesn't necessarily claim it
back neither until it _really_ needs it.

This avoid a _lot_ of possibly useless work for both the python
interpreter (keeping already allocated memory costs less than immediatly
returning it, just to try and allocate some more memory a couple
instructions later) and the system (ditto - FWIW, how linux handles
memory allocations is somewhat funny, if you ever programmed in C).

HTH