[Python-Dev] Re: Of slots and metaclasses...

Thu, 28 Feb 2002 16:51:45 -0500

[me]
> > A new-style class, with or without __slots__, should be considered
> > no different from a new-style built-in type, except that all of
> > the methods happen to be defined in Python (except maybe for
> > inherited methods).

[Kevin]
> Sure.  Except that I also want to be able to extend existing
> new-style classes/types in C, as well as Python.  Here is how I do
> it now (minus error checking and ref-counting):
> 
> static PyMethodDef PyRow_methods[] = {
>         {"__init__",      (PyCFunction)rowinit,       METH_VARARGS},
>         {"__repr__",      (PyCFunction)rowstrrepr,    METH_NOARGS },
>         {"__getitem__",   (PyCFunction)rowgetitem,    METH_VARARGS}
>         /* etc... */ }
> 
>   PyRow_Type = (PyTypeObject*)PyType_Type.tp_call((PyObject*)&PyType_Type,args, NULL)
> 
>   /* Methods must be added _after_ PyRow_Type has been created
>     since the type is an argument to PyDescr_NewMethod */
>   dict = PyRow_Type->tp_dict;
>   meth = PyRow_methods;
>   for (; meth->ml_name != NULL; meth++)
>   {
>       PyObject* method = PyDescr_NewMethod(PyRow_Type, meth);
>       PyDict_SetItemString(dict,meth->ml_name,method);
>   }

Heh?!?!!!  Why can't you declare PyRow_Type as a statically
initialized struct like all extensions and the core do?

[snip]

> Sure.  I was just hoping to have that list of descriptors
> pre-computed and stored in the class (like __mro__).

__mro__ gets used *all the time*; on every method lookup at least.
The list of instance variable descriptors is only interesting to a
small number of highly introspective tools.

> I suppose the question is why even expose __slots__ if it is so
> worthless?

It's found in the dict when the class is defined.  Why delete it?  The
idea is that you can make it a dict that has other info about the
slots.  It's got a __foo__ name.  I can give it any semantics I damn
well please. :-)

> > If the descriptors don't tell you everything you need, too bad --
> > some types just are like that.
> 
> This has _never_ been a concern of mine --  I don't mind if the C
> implementation chooses to hide things.

Exactly, and I'm telling you to have the same attitude about
slots.  Let me repeat something I just sent someone else about slots:

It seems that unfortunately __slots__ is Python 2.2's most
misunderstood feature...

I see it as a hack that lets me define a special-purpose class whose
instances are (almost) as efficient as I can do using C, but without
having to write a C extension.  (I say "almost", because a C extension
can store simple values as C ints, while __slots__ only lets you store
PyObject pointers.  But still, it's a big savings compared to adding a
__dict__ to every instance, and sometimes the slot value is picked
from a small number of interned or cached ints or strings.)

It has different semantics from regular attributes, and I don't try to
hide that: introspection doesn't find slots the same way as it finds
regular instance vars, you can't provide a default via a class
variable, and there are a bunch of "don't do that" things like
modifying __slots__ of an existing class or overriding a slot defined
by a base class.  (There's a whole list of warnings in
http://www.python.org/2.2/descrintro.html!)

I think as such, the feature is just right (except for the no-pickling
bug).  It's unfortunate that people have jumped on it as the answer to
all their questions.  I guess that means there's a big demand for more
control over instance variables -- whether that demand is created by a
real need or simply because that's how most other languages do it
remains to be seen...

> > Why do I reject your suggestion of making __slots__ (more) usable
> > for introspection?  Because it would create another split between
> > built-in types and user-defined classes: built-in types don't have
> > __slots__, so any strategy based on __slots__ will only work for
> > user-defined types.  And that's exactly what I'm trying to avoid!
> 
> Well, I'm busing creating C extension types that *do* have slots!
> One of my many current projects is to create a better type to store
> the results of relational database queries.  I want the memory
> efficiency of tuples and the ability to query by name (via
> __getitem__ or __getattr__).  So I basically need to re-invent a
> magic tuple type that adds descriptors for every named field.
> Strangely enough, this is basically what the slots mechanism does.
> I do realize that I could accomplish the same end by sub-classing
> tuple and adding a bunch of descriptors.

Note that there's something already there that you might reuse:
Objects/structseq.c, which is used to create the return values of
localtime(), stat() and a few others in a way that looks both like a
tuple and like a read-only record.  It may not be powerful enough
because I think the assumption is that the set of field names is
static, but you may be able to extend it or copy some good ideas.

(Just don't try to understand what it does to make the tuple shorter
than the record in some cases -- that's for backwards compatibility
because lots of code would break if e.g. struct() returned a longer
tuple than in previous Python versions, but we still want to provide
new fields when using named fields.  This part is not for the weak of
heart, and I didn't write it, and can't guarantee that it's 100%
bugfree.)

[items I rejected]

> > - Alter vars(obj) to return a dict of all attrs
> 
> Ok, I'm a little baffled by this.  Why not?

Currently, the assumption is that vars() returns a dict that can be
modified to modify the underlying object's attributes.  If it were to
return a synthetic dict, that wouldn't work, or it would require more
implementation effort than I care for -- again, since I doubt there is
much demand for this outside a small set of introspection tools.

> > I'll be the first to admit that some details are broken in 2.2.
> >
> > In particular, the fact that instances of classes with __slots__
> > appear picklable but lose all their slot values is a bug -- these
> > should either not be picklable unless you add a __reduce__ method,
> > or they should be pickled properly.
> 
> My vote is that they should be pickled properly by default.  In my
> mind, slots are a more static type of attribute.  Since they are
> more static, my feeling is that they should be as or more accessible
> than dict attributes.  Descriptors are fine for handing the black
> magic of making them addressable by name, but it just feels wrong to
> hide them from access by other means.  Of course, I am really
> talking about slots defined at the Python level -- not necessarily
> all storage allocated in the 'members' array.

Slots share their descriptor implementation with anything defined by
the tp_members array in a type object.  E.g. file.softspace is a
descriptor of the same type as used by slots.  What they share is that
they refer to "real" data stored in the instance -- either a PyObject*
or some basic C type like int or double.  I don't want to trust that
__slots__ has the right data: even if I made it immutable, someone
could still do C.__dict__['__slots__'] = <whatever>, and I don't want
to go so far as to make __slots__ a property stored in the type
object.  So I can't really tell which descriptors are slots and which
are other things -- and I don't want to, because I believe that would
be breaking through an abstraction.

> Unless attribute access becomes scoped based on the static type of
> the method, then I think it is a bug.  Re-declared slots become
> effectively orphaned and just waste memory.  Coalescing them or
> raising an exception when they are re-declared seem much better
> alternatives.

It's a bug to redeclare a slot.  I don't find it Python's job to make
it an error.

> > I think you're mostly right with your proposal "Update standard
> > library to use new reflection API".  Insofar as there are standard
> > support classes that use introspection to provide generic services
> > for classic classes, it would be nice of these could work
> > correctly for new-style classes even if they use slots or are
> > derived from non-trivial built-in types like dict or list.> This
> > is a big job, and I'd love some help.  Adding the right things to
> > the inspect module (without breaking pydoc :-) would probably be a
> > first priority.
> 
> Well, I'm happy to contribute, though my primary concern (other than
> correctness and completeness) is efficiency.  The whole reason I'm
> using slots is to save space when allocating huge numbers of fairly
> small objects.  I believe that there is a big performance difference
> between being able to pickle based on arbitrary descriptors and
> pickling just slots.  Slots are already nicely laid out in rows,
> just waiting to be plucked out and stuffed into a pickle.  Even
> without flattened __slots__ lists, it is a fast and trivial
> operation to iterate over a class and all its bases and extract
> slots.  Doing so over dictionaries is not nearly so trivial.

I think you're overstating the simplicity of pickling slots.  There is
no guarantee that the slots of a derived class are contiguous with the
slots of a base class; a __weakref__ and a __dict__ field may be
placed in between, and another metaclass could add other things.  For
example, you could write a metaclass in C that took the __slots__ idea
one step further and let you declare the types of the slots as basic C
types, so that other structmember keys could be used, e.g. T_INT or
T_FLOAT.

If you want your instances to be pickled *efficiently*, you should
write a custom reduce method in C anyway -- right now, new-style
classes are pickled by a piece of Python code at the end of
copy_reg.py.

> > Maybe you can formulate it as a set of tentative clarifying
> > patches to PEPs 252, 253, and 254?
> 
> To be honest, I forgot that those PEPs existed!  I've been working
> off of the Python 2.2 source and the tutorials.  I'll read them over
> tonight and see.

I had a feeling you were missing something basic. :-)

> When I say SOMMCP, I really mean the "metaclass protocol" defined by the
> various postulates and theorems in the first few chapters of the book.

As I said, I don't have the whole set in my head, so you'll have to be
more specific in your questions.  (Basically, I don't expect to be
adding much from the book, but I'll be looking to the book for clues
as we find problems with how things are implemented now, e.g. the
automatically derived metaclass issue below.)

> > - I currently don't complain when there are serious order
> >   disagreements.  I haven't decided yet whether to make these an
> >   error (then I'd have to implement an overridable way of defining
> >   "serious") or whether it's more Pythonic to leave this up to the
> >   user.
> 
> Sure -- I noticed this.  Maybe you should store the order-safety in the
> metaclass?  That way, the user can inspect it when they decide it is
> important.

You mean in the class object?  I'm not sure what you mean by "storing
the order-safety".  I currently don't calculate whether there are any
order conflicts: serious_order_disagreements() returns 0 without doing
anything.  Someone who wants it can easily implement the check from
the book though.

> > - I don't enforce any of their rules about cooperative methods.
> >   This is Pythonic: you can be cooperative but you don't have to
> >   be.  It would also be too incompatible with current practice (I
> >   expect few people will adopt super().)
> 
> I agree with most of that, except that I expect that MANY people
> will start using 'super'.

I doubt it with the current super(Class,self).method(args) notation.
Probably they will once super is a keyword so you can write
super.method(args).

> I've trained an office full of Java
> programmers to program in Python and they are always complaining
> about the lack of super calls.  Also, I've _always_ considered this
> idiom ugly and hackish:
> 
>   def Foo(Bar,Baz):
>     def __init__(self):
>       Bar.__init__(self)
>       Baz.__init__(self)

Strange that you mention Java in the same paragraph as an example
using multiple inheritance. ;-/

Also note that this is pretty much what C++ wants you to do, except it
uses '::' instead of '.' and doesn't require you to pass self (which
is a different issue).

I don't see this as a serious issue, just syntactic sugar.

> Its so much better as:
> 
>   def Foo(Bar,Baz):
>     def __init__(self):
>       # when super becomes a keyword and we write nice cooperative __init__
>       # methods
>       super.__init__(self)

But that's not what you'd be writing -- you'd be writing
super.__init__().

> > - I don't automatically derive a new metaclass if multiple base
> >   classes have different metaclasses.
> 
> I have my own ideas about this, but like you, don't have enough
> experience with them in practice to do anything about it.

Can you share them?  This might be interesting.

> >   Since I expect that non-trivial metaclasses are
> >   often implemented in C, I'm not so comfortable with automatically
> >   merging multiple metaclasses -- I can't prove to myself that it's
> >   always safe.
> 
> It is always safe when the assumption of monotonicity is not violated.

And that we can't know.

> > - I don't check that a base class doesn't override instance
> >   variables.  As I stated above, I don't think I should, but I'm not
> >   100% sure.
> 
> Do you mean slots or all Python instance attributes in this statement?

I just meant slots, but in a sense it's also true for other ivars: if
you don't know that your base class defines an ivar 'foo', you might
create your own ivar named 'foo' and use it in a way that's
inconsistent with the base class.  Because there are no type checks
and no ivar declarations, that's much harder to avoid in Python than
in more static languages like C++ or Java (I assume those will
complain when you redefine an ivar, even with the same type).

> > >   3) Do you intend to enforce monotonicity for all methods and
> > >      slots?  (Clearly, this is not desirable for instance
> > >      __dict__ attributes.)
> >
> > If I understand the concept of monotonicity, no.  Python
> > traditionally allows you to override methods in ways that are
> > incompatible with the contract of the base class method, and I
> > don't intend to forbid this.
> 
> For Python, monotonicity means that the instance attributes and
> instance methods of a class are a superset of those of all its
> ancestors.  This is not the way that normal __dict__ attributes work
> in Python, so lets talk only about slots when discussing monotonic
> properties.

I'm not sure what you mean by "this is not the way that normal
__dict__ attrs work", unless you are talking about overriding __init__
without calling the base class __init__ (and perhaps the same for
other methods), which of course can mean that a derived class instance
lacks an ivar that a base class instance would have.  This is Pythonic
freedom IMO.

Since it's not true for regular ivars, why worry about it for slots?

> In order words, it means that the metaclass interface
> does not provide a way to delete a slot or a method, only ways to
> add and override them.  Combined with some static type information,
> the assumption of monotonicity will be very helpful when we can
> eventually compile Python.

I don't think we should be guided here by what might be needed by a
compiler.  Without actually trying to build a compiler, we'll probably
miss important requirements that mean we'll have to change the
language anyway, and we'll impose requirements that we think might be
important without a good reason.  (E.g. structured programming was
once thought as an aid to compiler technology as well as to the human
reader.  Nowadays, optimizers reduce all control flow to labels and
goto statements. :-)

> > It would be good if PyChecker checked for accidental mistakes in
> > this area, and maybe there should be a way to declare that you do
> > want this enforced; I don't know how though.
> 
> I have a pretty good idea how.  Its essentially a proof-based method
> that works by solving metatype constraints.

Isn't that how most of PyChecker works?  At least the proof-base part?

> > There's also the issue that (again, if I remember the concepts right)
> > there are some semantic requirements that would be really hard to
> > check at compile time for Python.
> 
> True for __dict__ instance attributes, not for slots!

Again, you're trying to hijack slots for purposes for which they
weren't created.  Think of slots as an efficiency hack, *not* as a
better way to declare ivars.

> > >   4) Should descriptors work cooperatively?  i.e., allowing a
> > >      'super' call within __get__ and __set__.
> >
> > I don't think so, but I haven't thought through all the
> > consequences (I'm not sure why you're asking this, and whether
> > it's still a relevant question after my responses above).  You can
> > do this for properties though.
> 
>   class Foo(object):
>     __slots__=()
>     a = 1
> 
>   class Bar(Foo):
>     __slots__ = ('a',)
> 
>   bar = Bar()
>   print dir(a)
>   print a

That's a NameError, I suppose you meant 'bar' instead of 'a' in the
last two lines, then it makes sense. :-)

> The resolution rule for descriptors could work cooperatively to find
> Foo's class attribute 'a' instead of giving up with an
> AttributeError.

Once a descriptor is found, that's the end of the line.  When you find
a method, you call it, and it raises an exception, you're not going to
continue looking for a base class method either!

The descriptor type used to implement slots could do this, but
doesn't.  I don't care about this feature.  With a __dict__, there's
some real saving in not storing default values, since it means a
smaller dict, which can save space.  The slot space is always there,
so you might as well initialize it.

Concluding: don't expect that you can take an arbitrary class, analyze
what ivars it uses, and add a __slots__ variable to speed it up.
There are lots of differences in semantics when you use slots, and I
don't want to hide those.

--Guido van Rossum (home page: http://www.python.org/~guido/)