tp_base, ob_type, and tp_bases
jeff at jmcneil.net
Mon Jan 19 01:46:32 CET 2009
On Jan 18, 5:40 am, Carl Banks <pavlovevide... at gmail.com> wrote:
> On Jan 17, 8:12 am, Jeff McNeil <j... at jmcneil.net> wrote:
> > On Jan 17, 11:09 am, Jeff McNeil <j... at jmcneil.net> wrote:
> > > On Jan 17, 10:50 am, "Martin v. Löwis" <mar... at v.loewis.de> wrote:
> > > > > So, the documentation states that ob_type is a pointer to the type's
> > > > > type, or metatype. Rather, this is a pointer to the new type's
> > > > > metaclass?
> > > > That's actually the same. *Every* ob_type field points to the object's
> > > > type, e.g. for strings, integers, tuples, etc. That includes type
> > > > objects, where ob_type points to the type's type, i.e. it's meta-type,
> > > > also called metaclass (as "class" and "type" are really synonyms).
> > > > > Next, we have tp_base. That's defined as "an optional pointer to a
> > > > > base type from which type properties are inherited." The value of
> > > > > tp_base is then added to the tp_bases tuple. This is confusing me. On
> > > > > the surface, it sound as though they're one in the same?
> > > > (I don't understand the English "one in the same" - interpreting it
> > > > as "as though they should be the same")
> > > > No: tp_bases is a tuple of all base types (remember, there is multiple
> > > > inheritance); tp_base (if set) provides the first base type.
> > > > > I *think* (and dir() of a subclass of type seems to back it up), that
> > > > > tp_base is only at play when the object in question inherits from
> > > > > type?
> > > > No - it is the normal case for single inheritance. You can leave it
> > > > NULL, which means you inherit from object.
> > > > Regards,
> > > > Martin
> > > Thank you! It was tp_base that was confusing me. The tp_bases member
> > > makes sense as Python supports multiple inheritance. It wasn't
> > > immediately clear that tp_base is there for single inheritance
> > > reasons. It's all quite clear now.
> > > Is that an optimization of sorts?
> > Well, maybe not specifically for single inheritance reasons, I just
> > didn't see an immediate reason to keep a separate pointer to the first
> > base type.
> The reason you need a separate tp_base is because it doesn't
> necessarily point to the first base type; rather, it points to the
> first base type that has added any fields or slots to its internal
> layout (in other words, the first type with a tp_basicsize > 8, on 32-
> bit versions). I believe this is mainly for the benefit of Python
> subclasses that define their own slots. The type constructor will
> begin adding slots at an offset of tp_base->tp_basicsize.
> To see an example, int objects have a tp_basicsize of 12 (there are 4
> extra bytes for the interger). So if you multiply-inherit from int
> and a Python class, int will always be tp_base.
> class A(object): pass
> class B(int,A): pass
> print B.__base__ # will print <type 'int'>
> class C(A,int): pass
> print C.__base__ # will print <type 'int'>
> A related issue is that you can't multiply inherit from two types that
> have tp_basicsize > 8 unless one of them inherits from the other.
> There can be only one tp_base. For instance:
> class D(int,tuple): pass # will raise TypeError
> class E(object):
> __slots__ = ['a','b']
> class F(object):
> __slots__ = ['c','d']
> class G(E,G): pass # will raise TypeError
> class H(E,int): pass # will raise TypeError
> Here's a bit more background (and by "a bit" I mean "a lot"):
> In 32-bit Python, objects of types defined in Python are usually only
> 16 bytes long. The layout looks like this.
> instance dict
> weak reference list
> reference count
> The reference count, which is always the thing that the actual
> PyObject* points at, isn't actually the first item in the object's
> layout. The dict and weakref list are stored at a lower address.
> (There's a reason for it.)
> If a Python class defines any __slots__, the type constructor will add
> the slots to the object's layout.
> instance dict (if there is one)
> weak reference list (if there is one)
> reference count
> Note that, because you defined __slots__, the object might not have an
> instance dict or weak reference list associated with it. It might,
> though, if one of the base classes had defined an instance dict or
> weakref list. Or, it might not, but then a subclass might have its
> onw instance dict or weakref list. That's why these guys are placed
> at a lower address: so that they don't interfere with the layout of
> subclasses. A subclass can add either more slots or a dict.
> Object of types defined in C can have arbitrary layout. For instance,
> it could have a layout that looks like this:
> reference count
> PyObject* a
> PyObject* b
> long c
> float d
> instance dict
> A problem with Python slots and C fields is that, if you want to
> inherit from a type with a non-trivial object layout, aside from the
> dict and weakrefs, all of the subtypes have to maintain the same
> layout (see Liskov substitutability for rationale).
> Subtypes can add their own fields or slots if they want, though. So,
> if a Python subtype wants to define its own slots on top of a type
> with a non-trivial object layout, it has to know which base has the
> largest layout so that it doesn't use the same memory for its own
> slots. Hence tp_base.
> Carl Banks
Thank you, that was very well put.
I spent some more time tracking everything down today as well. There
are some useful comments in object.h, and coupled with what you've
said, it all make sense now. CPython really does have a very clean
code base. I've written a lot of Python and a few C extensions, but
I've never dived deep enough to understand how everything really
clicks. I'm one of those folks that's never happy with what the API
documentation says; I've got to understand all of the nooks and
crannies before I'm happy!
More information about the Python-list