
Tim Peters wrote:
[going back a week here, to dict resizing ...]
Yes, and the subject line does not correspond to the contents because at the moment I've sent this message, I ran out of disk space and the mailer picked a random header after destroying half of the messages in this mailbox.
[Vladimir Marangozov]
... All in all, for performance reasons, dicts remain an exception to the rule of releasing memory ASAP.
Yes, except I don't think there is such a rule! The actual rule is a balancing act between the cost of keeping memory around "just in case", and the expense of getting rid of it.
Good point.
Resizing a dict is extraordinarily expensive because the entire table needs to be rearranged, but lists make this tradeoff too (when you del a list element or list slice, it still goes thru NRESIZE, which still keeps space for as many as 100 "extra" elements around).
The various internal caches for int and frame objects (etc) also play this sort of game; e.g., if I happen to have a million ints sitting around at some time, Python effectively assumes I'll never want to reuse that int storage for anything other than ints again.
python-rarely-releases-memory-asap-ly y'rs - tim
Yes, and I'm somewhat sensible to this issue afer spending 6 years in a team which deals a lot with memory management (mainly DSM). In other words, you say that Python tolerates *virtual* memory fragmentation (a funny term :-). In the case of dicts and strings, we tolerate "internal fragmentation" (a contiguous chunk is allocated, then partially used). In the case of ints, floats or frames, we tolerate "external fragmentation". And as you said, Python tolerates this because of the speed/space tradeoff. Hopefully, all we deal with at this level is virtual memory, so even if you have zillions of ints, it's the OS VMM that will help you more with its long-term scheduling than Python's wild guesses about a hypothetical usage of zillions of ints later. I think that some OS concepts can give us hints on how to reduce our virtual fragmentation (which, as we all know, is a not a very good thing). A few kewords: compaction, segmentation, paging, sharing. We can't do much about our internal fragmentation, except changing the algorithms of dicts & strings (which is not appealing anyways). But it would be nice to think about the external fragmentation of Python's caches. Or even try to reduce the internal fragmentation in combination with the internal caches... BTW, this is the whole point of PyMalloc: in a virtual memory world, try to reduce the distance between the user view and the OS view on memory. PyMalloc addresses the fragmentation problem at a lower level of granularity than an OS (that is, *within* a page), because most Python's objects are very small. However, it can't handle efficiently large chunks like the int/float caches. Basically what it does is: segmentation of the virtual space and sharing of the cached free space. I think that Python could improve on sharing its internal caches, without significant slowdowns... The bottom line is that there's still plenty of room for exploring alternate mem mgt strategies that fit better Python's memory needs as a whole. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252