[Tutor] Why is an instance smaller than the sum of its components?
Steven D'Aprano
steve at pearwood.info
Wed Feb 4 00:18:16 CET 2015
On Tue, Feb 03, 2015 at 10:12:09PM +0100, Jugurtha Hadjar wrote:
> Hello,
>
> I was writing something and thought: Since the class had some
> 'constants', and multiple instances would be created, I assume that each
> instance would have its own data. So this would mean duplication of the
> same constants?
Not necessarily. Consider:
class A(object):
spam = 23
def __init__(self):
self.eggs = 42
In this case, the "spam" attribute is on the class, not the instance,
and so it doesn't matter how many A instances you have, there is only
one reference to 23 and a single copy of 23.
The "eggs" attribute is on the instance. That means that each instance
has its own separate reference to 42.
Does that mean a separate copy of 42? Maybe, maybe not. In general, yes:
if eggs was a mutable object like a list, or a dict, say:
self.eggs = []
then naturally it would need to be a separate list for each instance.
(If you wanted a single list shared between all instances, put it on the
class.) But with immutable objects like ints, strings and floats, there
is an optimization available to the Python compiler: it could reuse the
same object. There would be a separate reference to that object per
instance, but only one copy of the object itself.
Think of references as being rather like C pointers. References are
cheap, while objects themselves could be arbitrarily large.
With current versions of Python, the compiler will intern and re-use
small integers and strings which look like identifiers ("alpha" is an
identifier, "hello world!" is not). But that is subject to change: it is
not a language promise, it is an implementation optimization.
However, starting with (I think) Python 3.4 or 3.5, Python will optimize
even more! Instances will share dictionaries, which will save even more
memory. Each instance has a dict, which points to a hash table of (key,
value) records:
<instance a of A>
__dict__ ----> [ UNUSED UNUSED (ptr to key, ptr to value) UNUSED ... ]
<instance b of A>
__dict__ ----> [ UNUSED UNUSED (ptr to key, ptr to value) UNUSED ... ]
For most classes, the instances a and b will have the same set of keys,
even though the values will be different. That means the pointers to
keys are all the same. So the new implementation of dict will optimize
that case to save memory and speed up dictionary access.
> If so, I thought why not put the constants in memory
> once, for every instance to access (to reduce memory usage).
>
> Correct me if I'm wrong in my assumptions (i.e: If instances share stuff).
In general, Python will share stuff if it can, although maybe not
*everything* it can.
> So I investigated further..
>
> >>> import sys
> >>> sys.getsizeof(5)
> 12
>
>
> So an integer on my machine is 12 bytes.
A *small* integer is 12 bytes. A large integer can be more:
py> sys.getsizeof(2**100)
26
py> sys.getsizeof(2**10000)
1346
py> sys.getsizeof(2**10000000)
1333346
> Now:
>
> >>> class foo(object):
> ... def __init__(self):
> ... pass
>
> >>> sys.getsizeof(foo)
> 448
>
> >>> sys.getsizeof(foo())
> 28
>
> >>> foo
> <class '__main__.foo'>
> >>> foo()
> <__main__.foo object at 0xXXXXXXX
The *class* Foo is a fairly large object. It has space for a name, a
dictionary of methods and attributes, a tuple of base classes, a
table of weak references, a docstring, and more:
py> class Foo(object):
... pass
...
py> dir(Foo)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__',
'__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__',
'__qualname__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__weakref__']
py> vars(Foo)
mappingproxy({'__qualname__': 'Foo', '__module__': '__main__',
'__doc__': None, '__weakref__': <attribute '__weakref__' of 'Foo'
objects>, '__dict__': <attribute '__dict__' of 'Foo' objects>})
py> Foo.__base__
<class 'object'>
py> Foo.__bases__
(<class 'object'>,)
The instance may be quite small, but of course that depends on how many
attributes it has. Typically, all the methods live in the class, and are
shared, while data attributes are per-instance.
> - Second weird thing:
>
> >>> class bar(object):
> ... def __init__(self):
> ... self.w = 5
> ... self.x = 6
> ... self.y = 7
> ... self.z = 8
>
> >>> sys.getsizeof(bar)
> 448
> >>> sys.getsizeof(foo)
> 448
Nothing weird here. Both your Foo and Bar classes contain the same
attributes. The only difference is that Foo.__init__ method does
nothing, while Bar.__init__ has some code in it.
If you call
sys.getsizeof(foo.__init__.__code__)
and compare it to the same for bar, you should see a difference.
> >>> sys.getsizeof(bar())
> 28
> >>> sys.getsizeof(foo())
> 28
In this case, the Foo and Bar instances both have the same size. They
both have a __dict__, and the Foo instance's __dict__ is empty, while
the Bar instance's __dict__ has 4 items. Print:
print(foo().__dict__)
print(bar().__dict__)
to see the difference. But with only 4 items, Bar's items will fit in
the default sized hash table. No resize will be triggered and the sizes
are the same. Run this little snippet of code to see what happens:
d = {}
for c in "abcdefghijklm":
print(len(d), sys.getsizeof(d))
d[c] = None
> Summary questions:
>
> 1 - Why are foo's and bar's class sizes the same? (foo's just a nop)
Foo is a class, it certainly isn't a NOP. Just because you haven't given
it state or behaviour doesn't mean it doesn't have any. It has the
default state and behaviour that all classes start off with.
> 2 - Why are foo() and bar() the same size, even with bar()'s 4 integers?
Because hash tables (dicts) contain empty slots. Once the hash table
reaches 50% full, a resize is triggered.
> 3 - Why's bar()'s size smaller than the sum of the sizes of 4 integers?
Because sys.getsizeof tells you the size of the object, not the objects
referred to by the object. Here is a recipe for a recursive getsizeof:
http://code.activestate.com/recipes/577504
--
Steve
More information about the Tutor
mailing list