[Python-Dev] Memory (was: about line numbers, which was shrinking dicts)

Vladimir Marangozov Vladimir.Marangozov@inrialpes.fr
Sun, 22 Aug 1999 20:41:59 +0100 (NFT)


Tim Peters wrote:
>
> [going back a week here, to dict resizing ...]

Yes, and the subject line does not correspond to the contents because
at the moment I've sent this message, I ran out of disk space and the
mailer picked a random header after destroying half of the messages
in this mailbox.

>
> [Vladimir Marangozov]
> > ...
> > All in all, for performance reasons, dicts remain an exception
> > to the rule of releasing memory ASAP.
>
> Yes, except I don't think there is such a rule!  The actual rule is a
> balancing act between the cost of keeping memory around "just in case", and
> the expense of getting rid of it.

Good point.

>
> Resizing a dict is extraordinarily expensive because the entire table needs
> to be rearranged, but lists make this tradeoff too (when you del a list
> element or list slice, it still goes thru NRESIZE, which still keeps space
> for as many as 100 "extra" elements around).
>
> The various internal caches for int and frame objects (etc) also play this
> sort of game; e.g., if I happen to have a million ints sitting around at
> some time, Python effectively assumes I'll never want to reuse that int
> storage for anything other than ints again.
>
> python-rarely-releases-memory-asap-ly y'rs  - tim

Yes, and I'm somewhat sensible to this issue afer spending 6 years
in a team which deals a lot with memory management (mainly DSM).

In other words, you say that Python tolerates *virtual* memory fragmentation
(a funny term :-). In the case of dicts and strings, we tolerate "internal
fragmentation" (a contiguous chunk is allocated, then partially used).
In the case of ints, floats or frames, we tolerate "external fragmentation".

And as you said, Python tolerates this because of the speed/space tradeoff.
Hopefully, all we deal with at this level is virtual memory, so even if you
have zillions of ints, it's the OS VMM that will help you more with its
long-term scheduling than Python's wild guesses about a hypothetical usage
of zillions of ints later.

I think that some OS concepts can give us hints on how to reduce our
virtual fragmentation (which, as we all know, is a not a very good thing).
A few kewords: compaction, segmentation, paging, sharing.

We can't do much about our internal fragmentation, except changing the
algorithms of dicts & strings (which is not appealing anyways). But it
would be nice to think about the external fragmentation of Python's caches.
Or even try to reduce the internal fragmentation in combination with the
internal caches...

BTW, this is the whole point of PyMalloc: in a virtual memory world, try
to reduce the distance between the user view and the OS view on memory.
PyMalloc addresses the fragmentation problem at a lower level of granularity
than an OS (that is, *within* a page), because most Python's objects are
very small. However, it can't handle efficiently large chunks like the
int/float caches. Basically what it does is: segmentation of the virtual
space and sharing of the cached free space. I think that Python could
improve on sharing its internal caches, without significant slowdowns...

The bottom line is that there's still plenty of room for exploring alternate
mem mgt strategies that fit better Python's memory needs as a whole.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252