PEP 42: sizeof(obj) builtin

I'm about to start working on this one and wanted to check here first to make sure there is still a demand for it and to get ideas on the best implementation strategy. I'm thinking of summing all of the tp_basicsize slots while recursing through tp_traverse. Raymond Hettinger

[Raymond Hettinger]
Marc-Andre implemented it for mxTools: http://www.lemburg.com/files/python/mxTools.html sizeof(object) Returns the number of bytes allocated for the given Python object. Additional space allocated by the object and stored in pointers is not taken into account (though the pointer itself is). If the object defines tp_itemsize in its type object then it is assumed to be a variable size object and the size is adjusted accordingly. I don't know whether anyone finds it useful in real life; maybe MAL has an idea about that.
I'm thinking of summing all of the tp_basicsize slots while recursing through tp_traverse.
So you'd add in the size of 0.0 a million times when traversing [0.0] * 1000000 ? That wouldn't be useful. Keeping a memo to avoid double-counting might be useful, but then it also gets more complicated, and so much better to write it in Python building on something straighforward like mxTool's version.

Tim Peters wrote:
Some do; I wrote this for cache management to at least have a hint at the memory size being used by the objects in the cache.
That's what I always telling people too ;-) sizeof() in mxTools is only way to get at the tp_basicsize value, nothing more. The rest can easily be done in Python itself. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

M.-A. Lemburg wrote:
FWIW, here's something Dirk Holtwick started working on. import struct sizeof_PyObject = len(struct.pack('P', 0)) def calcsize(i, s=0): s = sizeof(i) if type(i) == type({}): s += (len(i) * 2 * sizeof_PyObject) # guess table size s += (len(i) * sizeof_PyObject * 2) / 3 for k, v in i.items(): s += calcsize(k) s += calcsize(v) elif type(i) == type([]): for v in i: s += calcsize(v) s += 1 * sizeof_PyObject elif type(i) == type((1,)): for v in i: s += calcsize(v) return s I'm sure this could easily be turned into a std lib module which then covers all the Python builtins.... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

"MAL" == M <mal@lemburg.com> writes:
MAL> I'm sure this could easily be turned into a std lib module MAL> which then covers all the Python builtins.... But I think such a function would be more useful if extension types can play along. I mean, there's fairly good lore about the memory footprint of built-in types, but it would be much harder to figure that out for some 3rd party extension type you grabbed off the net. So I think it would be much more useful in the latter case.
"SP" == Samuele Pedroni <pedronis@bluewin.ch> writes:
SP> it is very unlikely that we can implement this in Jython, SP> I would prefer it to be | sys.sizeof | or sys._sizeof This is an important point, so I'd prefer sys._sizeof(). -Barry

"Raymond Hettinger" <raymond.hettinger@verizon.net> writes:
Unfortunately sizeof() is implemented in ctypes with different sematics, there it has the same meaning as sizeof() in C, and only works on ctypes' data types: sizeof(c_int) == 4 on 32-bit machines. But I'm afraid a Python builtin sizeof() would win, so do I have to choose a different name?
Thomas

[Samuele Pedroni, on sizeof()]
it is very unlikely that we can implement this in Jython,
I'm sure that you could, but no better than CPython: return a lower bound that may or may not be within a factor of 1000 of reality. CPython could do C sizeof() on Python objects, but that's about it. There's no clear way to know how much overhead is due to pymalloc "per object", and no way at all to know how much overhead is imposed by the system malloc, or by the OS. Continuing Jeremy's example: How much memory does an empty Python dict consume? It's certain that *some* empty dicts grab at least 256KB from the system malloc, because that's the minimum chunk size pymalloc uses for its object arenas. Telling a user that the unlucky empty dict triggering a new arena allocation consumed some 128 bytes is a factor of 2000 off from what was actually gotten from the C library. There are many layers of memory, and we have a poor handle on most of them even in CPython. In a debug Python build, setting the envar PYTHONMALLOCSTATS displays a lot of detail about pymalloc's memory use, but that's only part of the story. Implement the Jython sizeof() to always return 8, and who'd know the difference <wink>.

Tim Peters wrote:
That's not the point of sizeof(). In mxTools I implemented pretty much the same thing as C's sizeof(typestruct), nothing more. You can use sizeof() to estimate the memory usage of an object, but if you want knowledge about the process memory footprint you're better off with the OS APIs.
Doesn't Java have some API which can be used to estimate the memory usage of an object ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

[Raymond Hettinger]
Marc-Andre implemented it for mxTools: http://www.lemburg.com/files/python/mxTools.html sizeof(object) Returns the number of bytes allocated for the given Python object. Additional space allocated by the object and stored in pointers is not taken into account (though the pointer itself is). If the object defines tp_itemsize in its type object then it is assumed to be a variable size object and the size is adjusted accordingly. I don't know whether anyone finds it useful in real life; maybe MAL has an idea about that.
I'm thinking of summing all of the tp_basicsize slots while recursing through tp_traverse.
So you'd add in the size of 0.0 a million times when traversing [0.0] * 1000000 ? That wouldn't be useful. Keeping a memo to avoid double-counting might be useful, but then it also gets more complicated, and so much better to write it in Python building on something straighforward like mxTool's version.

Tim Peters wrote:
Some do; I wrote this for cache management to at least have a hint at the memory size being used by the objects in the cache.
That's what I always telling people too ;-) sizeof() in mxTools is only way to get at the tp_basicsize value, nothing more. The rest can easily be done in Python itself. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

M.-A. Lemburg wrote:
FWIW, here's something Dirk Holtwick started working on. import struct sizeof_PyObject = len(struct.pack('P', 0)) def calcsize(i, s=0): s = sizeof(i) if type(i) == type({}): s += (len(i) * 2 * sizeof_PyObject) # guess table size s += (len(i) * sizeof_PyObject * 2) / 3 for k, v in i.items(): s += calcsize(k) s += calcsize(v) elif type(i) == type([]): for v in i: s += calcsize(v) s += 1 * sizeof_PyObject elif type(i) == type((1,)): for v in i: s += calcsize(v) return s I'm sure this could easily be turned into a std lib module which then covers all the Python builtins.... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

"MAL" == M <mal@lemburg.com> writes:
MAL> I'm sure this could easily be turned into a std lib module MAL> which then covers all the Python builtins.... But I think such a function would be more useful if extension types can play along. I mean, there's fairly good lore about the memory footprint of built-in types, but it would be much harder to figure that out for some 3rd party extension type you grabbed off the net. So I think it would be much more useful in the latter case.
"SP" == Samuele Pedroni <pedronis@bluewin.ch> writes:
SP> it is very unlikely that we can implement this in Jython, SP> I would prefer it to be | sys.sizeof | or sys._sizeof This is an important point, so I'd prefer sys._sizeof(). -Barry

"Raymond Hettinger" <raymond.hettinger@verizon.net> writes:
Unfortunately sizeof() is implemented in ctypes with different sematics, there it has the same meaning as sizeof() in C, and only works on ctypes' data types: sizeof(c_int) == 4 on 32-bit machines. But I'm afraid a Python builtin sizeof() would win, so do I have to choose a different name?
Thomas

[Samuele Pedroni, on sizeof()]
it is very unlikely that we can implement this in Jython,
I'm sure that you could, but no better than CPython: return a lower bound that may or may not be within a factor of 1000 of reality. CPython could do C sizeof() on Python objects, but that's about it. There's no clear way to know how much overhead is due to pymalloc "per object", and no way at all to know how much overhead is imposed by the system malloc, or by the OS. Continuing Jeremy's example: How much memory does an empty Python dict consume? It's certain that *some* empty dicts grab at least 256KB from the system malloc, because that's the minimum chunk size pymalloc uses for its object arenas. Telling a user that the unlucky empty dict triggering a new arena allocation consumed some 128 bytes is a factor of 2000 off from what was actually gotten from the C library. There are many layers of memory, and we have a poor handle on most of them even in CPython. In a debug Python build, setting the envar PYTHONMALLOCSTATS displays a lot of detail about pymalloc's memory use, but that's only part of the story. Implement the Jython sizeof() to always return 8, and who'd know the difference <wink>.

Tim Peters wrote:
That's not the point of sizeof(). In mxTools I implemented pretty much the same thing as C's sizeof(typestruct), nothing more. You can use sizeof() to estimate the memory usage of an object, but if you want knowledge about the process memory footprint you're better off with the OS APIs.
Doesn't Java have some API which can be used to estimate the memory usage of an object ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
participants (6)
-
barry@python.org
-
M.-A. Lemburg
-
Raymond Hettinger
-
Samuele Pedroni
-
Thomas Heller
-
Tim Peters