[Tutor] Why is an instance smaller than the sum of its components?

Wed Feb 4 02:51:18 CET 2015

On 02/04/2015 12:18 AM, Steven D'Aprano wrote:

>
> Not necessarily. Consider:
>
> class A(object):
>      spam = 23
>      def __init__(self):
>          self.eggs = 42
>
> In this case, the "spam" attribute is on the class, not the instance,
> and so it doesn't matter how many A instances you have, there is only
> one reference to 23 and a single copy of 23.
>
> The "eggs" attribute is on the instance. That means that each instance
> has its own separate reference to 42.
>

Hmm.. Here are the first few lines of my class:

class Sender(object):
	"""
		Redacted
	"""

	SENDER_DB = 'sender.db'

	def __init__(self, phone, balance=0.0):
		self.phone = phone
		self.balance = balance

I gave the (bad) examples that way because I thought what mattered is 
how much data was inside. I put SENDER_DB there because it made sense to 
put constants way on top, not because I had any idea it'd make the 
difference you mentioned (class attributes vs instance attributes).

And also because it's a common piece of data to all the 
methods...(because after I started with each method opening and closing 
the database,
I eliminated the code and made a method that returns a connection and a 
cursor, and the others just call it when they need to do stuff on the 
database. I'll ask another question later on how to refine it)

But now that you, Dave, and Peter pointed this out, I'm thinking of 
putting the methods' constants up there (mainly patterns for regular 
expressions, and queries (SQL)).

> Does that mean a separate copy of 42? Maybe, maybe not. In general, yes:
> if eggs was a mutable object like a list, or a dict, say:
>
>          self.eggs = []
>
> then naturally it would need to be a separate list for each instance.
> (If you wanted a single list shared between all instances, put it on the
> class.) But with immutable objects like ints, strings and floats, there
> is an optimization available to the Python compiler: it could reuse the
> same object. There would be a separate reference to that object per
> instance, but only one copy of the object itself.

Okay.. I think that even if Python does optimize that, this belongs to 
the "good practice" category, so it's better that I'm the one who does 
it instead of relying on what the compiler might do. I'm a 
beginner(that's the first thing I write that does something useful) and 
would like to reinforce good habits.

> Think of references as being rather like C pointers. References are
> cheap, while objects themselves could be arbitrarily large.
>

That's the analogy I made, but I'm careful with those. I don't want to 
end up like the "English As She Is Spoke" book..

> With current versions of Python, the compiler will intern and re-use
> small integers and strings which look like identifiers ("alpha" is an
> identifier, "hello world!" is not).

> ...

> In general, Python will share stuff if it can, although maybe not
> *everything* it can.

That's interesting. I'll try to read up on this without being sidetracked.

> In this case, the Foo and Bar instances both have the same size. They
> both have a __dict__, and the Foo instance's __dict__ is empty, while
> the Bar instance's __dict__ has 4 items. Print:
>
> print(foo().__dict__)
> print(bar().__dict__)
>
> to see the difference. But with only 4 items, Bar's items will fit in
> the default sized hash table. No resize will be triggered and the sizes
> are the same.

I thought that there was a default size allocated even for an "empty" 
class (which is correct), and then if I added w, x, y, z, their size 
would be *added* to the default size (which is incorrect)..

Somehow, I didn't think of the analogy of 8dec being (1000b) (4 bits) 
and incrementing, it's still 4 bits through 15dec (1111b).

So that's: default class size + data = default class size until it 
"overflows".  (or until 50% of default class size is reached as you 
mentioned later).

> Run this little snippet of code to see what happens:
> d = {}
> for c in "abcdefghijklm":
>      print(len(d), sys.getsizeof(d))
>      d[c] = None

For memo:

(0, 136)
(1, 136)
(2, 136)
(3, 136)
(4, 136)
(5, 136)
(6, 520)
(7, 520)
(8, 520)
(9, 520)
(10, 520)
(11, 520)
(12, 520)

>> Summary questions:
>>
>> 1 - Why are foo's and bar's class sizes the same? (foo's just a nop)
>
> Foo is a class, it certainly isn't a NOP. Just because you haven't given
> it state or behaviour doesn't mean it doesn't have any. It has the
> default state and behaviour that all classes start off with.
>
>> 2 - Why are foo() and bar() the same size, even with bar()'s 4 integers?
>
> Because hash tables (dicts) contain empty slots. Once the hash table
> reaches 50% full, a resize is triggered.
>
>> 3 - Why's bar()'s size smaller than the sum of the sizes of 4 integers?
>
> Because sys.getsizeof tells you the size of the object, not the objects
> referred to by the object. Here is a recipe for a recursive getsizeof:
>
> http://code.activestate.com/recipes/577504
>
>

This is cool. Thanks a lot (and Dave, too) for the great explanations.. 
I'll post some code about the database stuff in a new thread.

-- 
~Jugurtha Hadjar,