[issue10994] implementation details in sys module

New submission from Maciej Fijalkowski <fijall@gmail.com>: sys module documentation (as it is online) has some things that in my opinion should be marked as implementation details, but are not. Feel free to counter why not. Some of them has info it should be used for specialized purposes only, but IMO it's not the same as not mandatory for other implementations. Temporary list: _clear_type_cache dllhandle getrefcount getdlopenflags (?) getsizeof - it might be not well defined on other implementations setdlopenflags api_version ---------- assignee: docs@python components: Documentation messages: 126925 nosy: docs@python, fijall priority: normal severity: normal status: open title: implementation details in sys module type: behavior _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Changes by Antoine Pitrou <pitrou@free.fr>: ---------- stage: -> needs patch versions: +Python 2.7, Python 3.1, Python 3.2 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Changes by Łukasz Langa <lukasz@langa.pl>: ---------- nosy: +lukasz.langa _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Antoine Pitrou <pitrou@free.fr> added the comment: Well, getsizeof is not better-defined under CPython than elsewhere. It just gives a hint. Agreed about the other. ---------- nosy: +pitrou _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Maciej Fijalkowski <fijall@gmail.com> added the comment: I suppose wrt getsizeof it's more of "if you provide us with a reasonable expectations, we can implement this" other than anything else. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Antoine Pitrou <pitrou@free.fr> added the comment:
I suppose wrt getsizeof it's more of "if you provide us with a reasonable expectations, we can implement this" other than anything else.
The expectation is that it returns the memory footprint of the given object, and only it (not taking into account sharing, caching, dependencies or anything else). For example, an instance will not count its attribute __dict__. But a str object will count its object header plus the string payload, if the payload is private. Of course, you are free to tweak these semantics for the PyPy implementation. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Changes by Brett Cannon <brett@python.org>: ---------- nosy: +brett.cannon _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Armin Rigo <arigo@users.sourceforge.net> added the comment:
The expectation is that it returns the memory footprint of the given object, and only it (not taking into account sharing, caching, dependencies or anything else).
It would be nice if this was a well-defined definition, but unfortunately it is not. For example, string objects may appear different from the user's point of view (e.g. as seen by id() and 'is') but share the implementation's data; they may even share only a part of it (if ropes are enabled). Conversely, for user-defined objects you would typically think not to count the "shape" information, which is usually shared among several instances -- but then you risk a gross under-estimation in the (rarer) cases where it is not shared. Another way to look at the "official" definition is to return the size of the object itself and none of its dependencies, because in theory they might be shared; but that would make all strings, lists, tuples, dicts, and so on have a getsizeof() of 8 or 12, which is rather useless. I hope this clarifies fijal's original comment: "it might be not well defined on other implementations." ---------- nosy: +arigo _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Antoine Pitrou <pitrou@free.fr> added the comment:
The expectation is that it returns the memory footprint of the given object, and only it (not taking into account sharing, caching, dependencies or anything else).
It would be nice if this was a well-defined definition, but unfortunately it is not.
I didn't claim it was. Actually, if you read the rest of my message, I did mention that PyPy could tweak the semantics if it made more sense. So, of course, the more sharing and caching takes place, the less obvious these semantics are, but even with CPython they are not obvious anyway. It's not supposed to be an exact measurement for the common developer, rather a hint that experts can use to tweak their data structures and algorithms; you need to know details of your VM's implementation to use that information. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Maciej Fijalkowski <fijall@gmail.com> added the comment: I can hardly think about a specification that would potentially help me identify actual sizes. Even as a rough estimation. Which experts you had in mind? ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Antoine Pitrou <pitrou@free.fr> added the comment:
Which experts you had in mind?
People who know how the Python implementation works. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Maciej Fijalkowski <fijall@gmail.com> added the comment:
Which experts you had in mind?
People who know how the Python implementation works.
I'm serious. What semantics would make sense to anyone? Even if you know implementation quite well a single number per object does not provide enough information. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Antoine Pitrou <pitrou@free.fr> added the comment:
Even if you know implementation quite well a single number per object does not provide enough information.
Enough information for what? It can certainly provide information about the overhead of that particular object (again, regardless of sharing). ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Brett Cannon <brett@python.org> added the comment: You could return -1 for everything. =) In all seriousness, it could simply be proportional. IMO as long as people realize if a list takes up less space than a dict then the numbers seem fine to me. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Martin v. Löwis <martin@v.loewis.de> added the comment: I can propose a specification of getsizeof: if you somehow manage to traverse all objects (without considering an object twice), and sum up the getsizeof results, you should end up with something close to, but smaller than the actual memory consumption. How close is a quality-of-implementation issue (so always returning 0 would be correct-but-useless). It may be that implementations can also support counting certain hidden memory usage (headers, blocks shared across instances that are not objects themselves). Such functions would should have different names and interfaces (e.g. sys.gethiddenblocks(o) may return a list of (address, size) pairs); CPython doesn't provide any such function (although sys.mallocoverhead might be useful). In any case: I'm not convinced that it is useful to mark functions as CPython-specific in the documentation. This clutters the documentation, and is of interest only for language lawyers. So if implementation details are to be documented, I'd prefer this to happen in a separate document. ---------- nosy: +loewis _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Armin Rigo <arigo@users.sourceforge.net> added the comment: Martin: I kind of agree with you, although I guess that for pratical reasons if you don't have a reasonable sys.getsizeof() implementation then it's better to raise TypeError than return 0 (like CPython, which may raise "TypeError: Type %.100s doesn't define __sizeof__"). I agree that it's not really useful to mark functions as CPython-specific in the documentation, if only because whenever a new implementation like PyPy comes along, then it's going to have a rather different set of functions that it wants to consider implementation details. I would say that more than half the functions in the sys module marked CPython-specific in the doc are implemented in PyPy just fine, and there is an equal number of functions not marked CPython-specific that have no chance to be implemented in PyPy. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Terry J. Reedy <tjreedy@udel.edu> added the comment: The __sizeof__ special attribute shows up in dir(object) but appears not to be documented other than with
help(object.__sizeof__) Help on method_descriptor: __sizeof__(...) __sizeof__() -> size of object in memory, in bytes
Should it have an entry in Lib 4.12. Special Attributes? object.__sizeof__ A method used by sys.getsizeof. It should then show up in the index (missing now) and point people to sys.getsizeof. Looking further, I see that it is mentioned but not indexed in the sys.getsizeof entry. ---------- nosy: +terry.reedy _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________

Changes by Ezio Melotti <ezio.melotti@gmail.com>: ---------- versions: +Python 3.3, Python 3.4 -Python 3.1 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue10994> _______________________________________
participants (8)
-
Antoine Pitrou
-
Armin Rigo
-
Brett Cannon
-
Ezio Melotti
-
Maciej Fijalkowski
-
Martin v. Löwis
-
Terry J. Reedy
-
Łukasz Langa