tracking memory consumption

Hi! In PR#214 Martin v. Loewis suggests a sizeof function as result of request to python-help. I've followed the thread silently until now. On platforms with virtual memory subsystem this is usually a not an issue. On embedded systems and ancient OSes (like MS-Dos) it is often useful, if applications can estimate how much memory their data consumes. The sizeof() function proposed by Martin is only one possible approach I can think of. Another approach would be encapsulationg the 'malloc/free'-logic into a wrapper, that traces all allocations und deallocations in a special private 'usedmem' variable, which could be queried by a function sys.usedmem() returning an integer. Very often this is more convinient than a sizeof() function, because you don't need to embed the summing into a maybe complicated nested object data structure. Although 'usedmem' wouldn't return a precise measure, it is often sufficient to estimate and it also should be rather easy to implement. We have implemented this approach years ago here in a Modula-2 based system, where we however had one great advantage: the Modula-2 Storage.DEALLOCATE procedure has a second parameter giving the size of the data, which is missing from the signature of the C-library free() function. So a wrapper around 'free()' would have to use an additional hash or has to know something about the internals of the underlying malloc library. The former of course would hurt portability. Regards from Germany, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)

Peter Funk wrote:
For such basic computations of the native objects sizes, struct.calcsize is your friend.
Yes, we're moving slowly in this direction. The problem is not easy though, because it has many, many facets: legacy code, garbage collection, storage optimization, performance, etc. So we need to think big here. The roadmap (short & long term plan) that I'm preconizing is: 1. Take control over Python's memory. Presently, this is not the case; to remedy the situation, I've already sent a huge patch suite to python-patches aiming at gaining control over "the Python heap", without disturbing legacy code. 2. Collect some stats, based on 1, and possibly expose some of them to the user. Provide some mem & object monitoring facilities. 3. Optimize Python's mallocs (for instance, make the object-specific allocators aware of each other), based on the stats from 2. This will result in better mem sharing and possibly new speed/space tradeoffs (in conjuction with 4). 4. Take preventive actions, like garbage collection, mem compaction or other appropriate procedures, which will be trigerred on pertinent thresholds issued from 2. These are aimed at ensuring safety and liveness of the Python process. Furthermore, I expect that the performance gain from 3) would be lost for 4), so we'll probably end up with an added value, without performance hits. This implies, of course, a number of compromises that need to be weighted carefully, but we're not at this stage yet. Currently, we can't do anything cool because we need 1) above all. Gaining control on the Python heap is a precondition for any future work on dynamic storage allocation/management. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Unfortunately, it does not work in the general case, or I'm missing something: Given
how exactly do you find out the size occupied by a? I don't want to count the size of the keys and values themselves - just the memory used by the storage of the references to the keys and the values. Regards, Martin

Peter Funk wrote:
For such basic computations of the native objects sizes, struct.calcsize is your friend.
Yes, we're moving slowly in this direction. The problem is not easy though, because it has many, many facets: legacy code, garbage collection, storage optimization, performance, etc. So we need to think big here. The roadmap (short & long term plan) that I'm preconizing is: 1. Take control over Python's memory. Presently, this is not the case; to remedy the situation, I've already sent a huge patch suite to python-patches aiming at gaining control over "the Python heap", without disturbing legacy code. 2. Collect some stats, based on 1, and possibly expose some of them to the user. Provide some mem & object monitoring facilities. 3. Optimize Python's mallocs (for instance, make the object-specific allocators aware of each other), based on the stats from 2. This will result in better mem sharing and possibly new speed/space tradeoffs (in conjuction with 4). 4. Take preventive actions, like garbage collection, mem compaction or other appropriate procedures, which will be trigerred on pertinent thresholds issued from 2. These are aimed at ensuring safety and liveness of the Python process. Furthermore, I expect that the performance gain from 3) would be lost for 4), so we'll probably end up with an added value, without performance hits. This implies, of course, a number of compromises that need to be weighted carefully, but we're not at this stage yet. Currently, we can't do anything cool because we need 1) above all. Gaining control on the Python heap is a precondition for any future work on dynamic storage allocation/management. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Unfortunately, it does not work in the general case, or I'm missing something: Given
how exactly do you find out the size occupied by a? I don't want to count the size of the keys and values themselves - just the memory used by the storage of the references to the keys and the values. Regards, Martin
participants (3)
-
Martin von Loewis
-
pf@artcom-gmbh.de
-
Vladimir Marangozov