[Python-checkins] CVS: python/nondist/peps pep-0253.txt,1.3,1.4
Guido van Rossum
gvanrossum@users.sourceforge.net
Wed, 13 Jun 2001 14:48:33 -0700
Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv30297
Modified Files:
pep-0253.txt
Log Message:
Another intermediate update. I've rewritten the requirements for a
base type to be subtypable. Needs way more work!
Index: pep-0253.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0253.txt,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -r1.3 -r1.4
*** pep-0253.txt 2001/06/11 20:07:37 1.3
--- pep-0253.txt 2001/06/13 21:48:31 1.4
***************
*** 11,22 ****
Abstract
! This PEP proposes ways for creating subtypes of existing built-in
! types, either in C or in Python. The text is currently long and
! rambling; I'll go over it again later to make it shorter.
Traditionally, types in Python have been created statically, by
declaring a global variable of type PyTypeObject and initializing
it with a static initializer. The fields in the type object
! describe all aspects of a Python object that are relevant to the
Python interpreter. A few fields contain dimensional information
(e.g. the basic allocation size of instances), others contain
--- 11,24 ----
Abstract
! This PEP proposes additions to the type object API that will allow
! the creation of subtypes of built-in types, in C and in Python.
+
+ Introduction
+
Traditionally, types in Python have been created statically, by
declaring a global variable of type PyTypeObject and initializing
it with a static initializer. The fields in the type object
! describe all aspects of a Python type that are relevant to the
Python interpreter. A few fields contain dimensional information
(e.g. the basic allocation size of instances), others contain
***************
*** 27,34 ****
exception when the behavior is invoked. Some collections of
functions pointers that are usually defined together are obtained
! indirectly via a pointer to an additional structure containing.
While the details of initializing a PyTypeObject structure haven't
! been documented as such, they are easily glanced from the examples
in the source code, and I am assuming that the reader is
sufficiently familiar with the traditional way of creating new
--- 29,37 ----
exception when the behavior is invoked. Some collections of
functions pointers that are usually defined together are obtained
! indirectly via a pointer to an additional structure containing
! more function pointers.
While the details of initializing a PyTypeObject structure haven't
! been documented as such, they are easily gleaned from the examples
in the source code, and I am assuming that the reader is
sufficiently familiar with the traditional way of creating new
***************
*** 36,63 ****
This PEP will introduce the following features:
! - a type, like a class, can be a factory for its instances
! - types can be subtyped in C by specifying a base type pointer
! - types can be subtyped in Python using the class statement
! - multiple inheritance from types (insofar as practical)
! - the standard coercions (int, tuple, str etc.) will be the
! corresponding type objects
! - a standard type hierarchy
This PEP builds on pep-0252, which adds standard introspection to
! types; in particular, types are assumed to have e.g. a __hash__
! method when the type object defines the tp_hash slot. pep-0252 also
! adds a dictionary to type objects which contains all methods. At
! the Python level, this dictionary is read-only; at the C level, it
! is accessible directly (but modifying it is not recommended except
! as part of initialization).
! Metatypes
Inevitably the following discussion will come to mention metatypes
--- 39,83 ----
This PEP will introduce the following features:
+
+ - a type can be a factory function for its instances
+
+ - types can be subtyped in C
! - types can be subtyped in Python with the class statement
! - multiple inheritance from types is supported (insofar as
! practical)
! - the standard coercions functions (int, tuple, str etc.) will be
! redefined to be the corresponding type objects, which serve as
! their own factory functions
! - there will be a standard type hierarchy
! - a class statement can contain a metaclass declaration,
! specifying the metaclass to be used to create the new class
! - a class statement can contain a slots declaration, specifying
! the specific names of the instance variables supported
This PEP builds on pep-0252, which adds standard introspection to
! types; e.g., when the type object defines the tp_hash slot, the
! type object has a __hash__ method. pep-0252 also adds a
! dictionary to type objects which contains all methods. At the
! Python level, this dictionary is read-only for built-in types; at
! the C level, it is accessible directly (but it should not be
! modified except as part of initialization).
!
! For binary compatibility, a flag bit in the tp_flags slot
! indicates the existence of the various new slots in the type
! object introduced below. Types that don't have the
! Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags field are assumed
! to have NULL values for all the subtyping slots. (Warning: the
! current implementation prototype is not yet consistent in its
! checking of this flag bit. This should be fixed before the final
! release.)
! About metatypes
Inevitably the following discussion will come to mention metatypes
***************
*** 76,99 ****
In this example, type(a) is a "regular" type, and type(type(a)) is
a metatype. While as distributed all types have the same metatype
! (which is also its own metatype), this is not a requirement, and
! in fact a useful 3rd party extension (ExtensionClasses by Jim
! Fulton) creates an additional metatype. A related feature is the
! "Don Beaudry hook", which says that if a metatype is callable, its
! instances (which are regular types) can be subclassed (really
! subtyped) using a Python class statement. We will use this rule
! to support subtyping of built-in types, and in fact it greatly
! simplifies the logic of class creation to always simply call the
! metatype. When no base class is specified, a default metatype is
! called -- the default metatype is the "ClassType" object, so the
! class statement will behave as before in the normal case.
!
! Python uses the concept of metatypes or metaclasses in a
! different way than Smalltalk. In Smalltalk-80, there is a
! hierarchy of metaclasses that mirrors the hierarchy of regular
! classes, metaclasses map 1-1 to classes (except for some funny
! business at the root of the hierarchy), and each class statement
! creates both a regular class and its metaclass, putting class
! methods in the metaclass and instance methods in the regular
! class.
Nice though this may be in the context of Smalltalk, it's not
--- 96,120 ----
In this example, type(a) is a "regular" type, and type(type(a)) is
a metatype. While as distributed all types have the same metatype
! (PyType_Type, which is also its own metatype), this is not a
! requirement, and in fact a useful and relevant 3rd party extension
! (ExtensionClasses by Jim Fulton) creates an additional metatype.
!
! A related feature is the "Don Beaudry hook", which says that if a
! metatype is callable, its instances (which are regular types) can
! be subclassed (really subtyped) using a Python class statement.
! I will use this rule to support subtyping of built-in types, and
! in fact it greatly simplifies the logic of class creation to
! always simply call the metatype. When no base class is specified,
! a default metatype is called -- the default metatype is the
! "ClassType" object, so the class statement will behave as before
! in the normal case.
!
! Python uses the concept of metatypes or metaclasses in a different
! way than Smalltalk. In Smalltalk-80, there is a hierarchy of
! metaclasses that mirrors the hierarchy of regular classes,
! metaclasses map 1-1 to classes (except for some funny business at
! the root of the hierarchy), and each class statement creates both
! a regular class and its metaclass, putting class methods in the
! metaclass and instance methods in the regular class.
Nice though this may be in the context of Smalltalk, it's not
***************
*** 107,118 ****
initialize it at will.)
- Instantiation by calling the type object
! Traditionally, for each type there is at least one C function that
! creates instances of the type (e.g. PyInt_FromLong(),
! PyTuple_New() and so on). This function has to take care of
both allocating memory for the object and initializing that
! memory. As of Python 2.0, it also has to interface with the
garbage collection subsystem, if the type chooses to participate
in garbage collection (which is optional, but strongly recommended
--- 128,154 ----
initialize it at will.)
+ Metatypes determine various *policies* for types, e.g. what
+ happens when a type is called, how dynamic types are (whether a
+ type's __dict__ can be modified after it is created), what the
+ method resolution order is, how instance attributes are looked
+ up, and so on.
+
+ I'll argue that left-to-right depth-first is not the best
+ solution when you want to get the most use from multiple
+ inheritance.
+
+ I'll argue that with multiple inheritance, the metatype of the
+ subtype must be a descendant of the metatypes of all base types.
+
+ I'll come back to metatypes later.
! Making a type a factory for its instances
!
! Traditionally, for each type there is at least one C factory
! function that creates instances of the type (PyTuple_New(),
! PyInt_FromLong() and so on). These factory functions take care of
both allocating memory for the object and initializing that
! memory. As of Python 2.0, they also have to interface with the
garbage collection subsystem, if the type chooses to participate
in garbage collection (which is optional, but strongly recommended
***************
*** 120,202 ****
references to other objects, and hence may participate in
reference cycles).
-
- If we're going to implement subtyping, we must separate allocation
- and initialization: typically, the most derived subtype is in
- charge of allocation (and hence deallocation!), but in most cases
- each base type's initializer (constructor) must still be called,
- from the "most base" type to the most derived type.
-
- But let's first get the interface for instantiation right. If we
- call an object, the tp_call slot if its type gets invoked. Thus,
- if we call a type, this invokes the tp_call slot of the type's
- type: in other words, the tp_call slot of the metatype.
- Traditionally this has been a NULL pointer, meaning that types
- can't be called. Now we're adding a tp_call slot to the metatype,
- which makes all types "callable" in a trivial sense. But
- obviously the metatype's tp_call implementation doesn't know how
- to initialize the instances of individual types. So the type
- defines a new slot, tp_new, which is invoked by the metatype's
- tp_call slot. If the tp_new slot is NULL, the metatype's tp_call
- issues a nice error message: the type isn't callable.
-
- This mechanism gives the maximum freedom to the type: a type's
- tp_new doesn't necessarily have to return a new object, or even an
- object that is an instance of the type (although the latter should
- be rare).
-
- HIRO
-
- The deallocation mechanism chosen should match the allocation
- mechanism: an allocation policy should prescribe both the
- allocation and deallocation mechanism. And again, planning ahead
- for subtyping would be nice. But the available mechanisms are
- different. The deallocation function has always been part of the
- type structure, as tp_dealloc, which combines the
- "uninitialization" with deallocation. This was good enough for
- the traditional situation, where it matched the combined
- allocation and initialization of the creation function. But now
- imagine a type whose creation function uses a special free list
- for allocation. It's deallocation function puts the object's
- memory back on the same free list. But when allocation and
- creation are separate, the object may have been allocated from the
- regular heap, and it would be wrong (in some cases disastrous) if
- it were placed on the free list by the deallocation function.
! A solution would be for the tp_construct function to somehow mark
! whether the object was allocated from the special free list, so
! that the tp_dealloc function can choose the right deallocation
! method (assuming that the only two alternatives are a special free
! list or the regular heap). A variant that doesn't require space
! for an allocation flag bit would be to have two type objects,
! identical in the contents of all their slots except for their
! deallocation slot. But this requires that all type-checking code
! (e.g. the PyDict_Check()) recognizes both types. We'll come back
! to this solution in the context of subtyping. Another alternative
! is to require the metatype's tp_call to leave the allocation to
! the tp_construct method, by passing in a NULL pointer. But this
! doesn't work once we allow subtyping.
!
! Eventually, when we add any form of subtyping, we'll have to
! separate deallocation from uninitialization. The way to do this
! is to add a separate slot to the type object that does the
! uninitialization without the deallocation. Fortunately, there is
! already such a slot: tp_clear, currently used by the garbage
! collection subsystem. A simple rule makes this slot reusable as
! an uninitialization: for types that support separate allocation
! and initialization, tp_clear must be defined (even if the object
! doesn't support garbage collection) and it must DECREF all
! contained objects and FREE all other memory areas the object owns.
! It must also be reentrant: it must be possible to clear an already
! cleared object. The easiest way to do this is to replace all
! pointers DECREFed or FREEd with NULL pointers.
! Subtyping in C
The simplest form of subtyping is subtyping in C. It is the
simplest form because we can require the C code to be aware of the
various problems, and it's acceptable for C code that doesn't
! follow the rules to dump core; while for Python subtyping we would
! need to catch all errors before they become core dumps.
The idea behind subtyping is very similar to that of single
--- 156,200 ----
references to other objects, and hence may participate in
reference cycles).
! In this proposal, type objects can be factory functions for their
! instances, making the types directly callable from Python. This
! mimics the way classes are instantiated. Of course, the C APIs
! for creating instances of various built-in types will remain valid
! and probably the most common; and not all types will become their
! own factory functions.
!
! The type object has a new slot, tp_new, which can act as a factory
! for instances of the type. Types are made callable by providing a
! tp_call slot in PyType_Type (the metatype); the slot
! implementation function looks for the tp_new slot of the type that
! is being called.
!
! If the type's tp_new slot is NULL, an exception is raised.
! Otherwise, the tp_new slot is called. The signature for the
! tp_new slot is
!
! PyObject *tp_new(PyTypeObject *type,
! PyObject *args,
! PyObject *kwds)
!
! where 'type' is the type whose tp_new slot is called, and 'args'
! and 'kwds' are the sequential and keyword arguments to the call,
! passed unchanged from tp_call. (The 'type' argument is used in
! combination with inheritance, see below.)
!
! There are no constraints on the object type that is returned,
! although by convention it should be an instance of the given
! type. It is not necessary that a new object is returned; a
! reference to an existing object is fine too. The return value
! should always be a new reference, owned by the caller.
! Requirements for a type to allow subtyping
The simplest form of subtyping is subtyping in C. It is the
simplest form because we can require the C code to be aware of the
various problems, and it's acceptable for C code that doesn't
! follow the rules to dump core. For added simplicity, it is
! limited to single inheritance.
The idea behind subtyping is very similar to that of single
***************
*** 207,236 ****
the type object, leaving others the same.
! Not every type can serve as a base type. The base type must
! support separation of allocation and initialization by having a
! tp_construct slot that can be called with a preallocated object,
! and it must support uninitialization without deallocation by
! having a tp_clear slot as described above. The derived type must
! also export the structure declaration for its instances through a
! header file, as it is needed in order to derive a subtype. The
! type object for the base type must also be exported.
If the base type has a type-checking macro (e.g. PyDict_Check()),
! this macro may be changed to recognize subtypes. This can be done
! by using the new PyObject_TypeCheck(object, type) macro, which
! calls a function that follows the base class links. There are
! arguments for and against changing the type-checking macro in this
! way. The argument for the change should be clear: it allows
! subtypes to be used in places where the base type is required,
! which is often the prime attraction of subtyping (as opposed to
! sharing implementation). An argument against changing the
! type-checking macro could be that the type check is used
! frequently and a function call would slow things down too much
! (hard to believe); or one could fear that a subtype might break an
! invariant assumed by the support functions of the base type.
! Sometimes it would be wise to change the base type to remove this
! reliance; other times, it would be better to require that derived
! types (implemented in C) maintain the invariants.
The derived type begins by declaring a type structure which
contains the base type's structure. For example, here's the type
--- 205,372 ----
the type object, leaving others the same.
! Most issues have to do with construction and destruction of
! instances of derived types.
+ Creation of a new object is separated into allocation and
+ initialization: allocation allocates the memory, and
+ initialization fill it with appropriate initial values. The
+ separation is needed for the convenience of subtypes.
+ Instantiation of a subtype goes as follows:
+
+ 1. allocate memory for the whole (subtype) instance
+ 2. initialize the base type
+ 3. initialize the subtype's instance variables
+
+ If allocation and initialization were done by the same function,
+ you would need a way to tell the base type's constructor to
+ allocate additional memory for the subtype's instance variables,
+ and there would be no way to change the allocation method for a
+ subtype (without giving up on calling the base type to initialize
+ its part of the instance structure).
+
+ A similar reasoning applies to destruction: if a subtype changes
+ the instance allocator (e.g. to use a different heap), it must
+ also change the instance deallocator; but it must still call on
+ the base type's destructor to DECREF the base type's instance
+ variables.
+
+ In this proposal, I assign stricter meanings to two existing
+ slots for deallocation and deinitialization, and I add two new
+ slots for allocation and initialization.
+
+ The tp_clear slot gets the new task of deinitializing an object so
+ that all that remains to be done is free its memory. Originally,
+ all it had to do was clear object references. The difference is
+ subtle: the list and dictionary objects contain references to an
+ additional heap-allocated piece of memory that isn't freed by
+ tp_clear in Python 2.1, but which must be freed by tp_clear under
+ this proposal. It should be safe to call tp_clear repeatedly on
+ the same object. If an object contains no references to other
+ objects or heap-allocated memory, the tp_clear slot may be NULL.
+
+ The only additional requirement for the tp_dealloc slot is that it
+ should do the right thing whether or not tp_clear has been called.
+
+ The new slots are tp_alloc for allocation and tp_init for
+ initialization. Their signatures:
+
+ PyObject *tp_alloc(PyTypeObject *type,
+ PyObject *args,
+ PyObject *kwds)
+
+ int tp_init(PyObject *self,
+ PyObject *args,
+ PyObject *kwds)
+
+ The arguments for tp_alloc are the same as for tp_new, described
+ above. The arguments for tp_init are the same except that the
+ first argument is replaced with the instance to be initialized.
+ Its return value is 0 for success or -1 for failure.
+
+ It is possible that tp_init is called more than once or not at
+ all. The implementation should allow this usage. The object may
+ be non-functional until tp_init is called, and a second call to
+ tp_init may raise an exception, but it should not be possible to
+ cause a core dump or memory leakage this way.
+
+ Because tp_init is in a sense optional, tp_alloc is required to do
+ *some* initialization of the object. It is required to initialize
+ ob_refcnt to 1 and ob_type to its type argument. To be safe, it
+ should probably zero out the rest of the object.
+
+ The constructor arguments are passed to tp_alloc so that for
+ variable-size objects (like tuples and strings) it knows to
+ allocate the right amount of memory.
+
+ For immutable types, tp_alloc may have to do the full
+ initialization; otherwise, different calls to tp_init might cause
+ an immutable object to be modified, which is considered a grave
+ offense in Python (unlike in Fortran :-).
+
+ Not every type can serve as a base type. The assumption is made
+ that if a type has a non-NULL value in its tp_init slot, it is
+ ready to be subclassed; otherwise, it is not, and using it as a
+ base class will raise an exception.
+
+ In order to be usefully subtyped in C, a type must also export the
+ structure declaration for its instances through a header file, as
+ it is needed in order to derive a subtype. The type object for
+ the base type must also be exported.
+
If the base type has a type-checking macro (e.g. PyDict_Check()),
! this macro probably should be changed to recognize subtypes. This
! can be done by using the new PyObject_TypeCheck(object, type)
! macro, which calls a function that follows the base class links.
!
! (An argument against changing the type-checking macro could be
! that the type check is used frequently and a function call would
! slow things down too much, but I find this hard to believe. One
! could also fear that a subtype might break an invariant assumed by
! the support functions of the base type. Usually it is best to
! change the base type to remove this reliance, at least to the
! point of raising an exception rather than dumping core when the
! invariant is broken.)
!
! Here are the inteactions between, tp_alloc, tp_clear, tp_dealloc
! and subtypes; all assuming that the base type defines tp_init
! (otherwise it cannot be subtyped anyway):
!
! - If the base type's allocation scheme doesn't use the standard
! heap, it should not define tp_alloc. This is a signal for the
! subclass to provide its own tp_alloc *and* tp_dealloc
! implementation (probably using the standard heap).
!
! - If the base type's tp_dealloc does anything besides calling
! PyObject_DEL() (typically, calling Py_XDECREF() on contained
! objects or freeing dependent memory blocks), it should define a
! tp_clear that does the same without calling PyObject_DEL(), and
! which checks for zero pointers before and zeros the pointers
! afterwards, so that calling tp_clear more than once or calling
! tp_dealloc after tp_clear will not attempt to DECREF or free the
! same object/memory twice. (It should also be allowed to
! continue using the object after tp_clear -- tp_clear should
! simply reset the object to its pristine state.)
!
! - If the derived type overrides tp_alloc, it should also override
! tp_dealloc, and tp_dealloc should call the derived type's
! tp_clear if non-NULL (or its own tp_clear).
!
! - If the derived type overrides tp_clear, it should call the base
! type's tp_clear if non-NULL.
!
! - If the base type defines tp_init as well as tp_new, its tp_new
! should be inheritable: it should call the tp_alloc and the
! tp_init of the type passed in as its first argument.
!
! - If the base type defines tp_init as well as tp_alloc, its
! tp_alloc should be inheritable: it should look in the
! tp_basicsize slot of the type passed in for the amount of memory
! to allocate, and it should initialize all allocated bytes to
! zero.
!
! - For types whose tp_itemsize is nonzero, the allocation size used
! in tp_alloc should be tp_basicsize + n*tp_itemsize, rounded up
! to the next integral multiple of sizeof(PyObject *), where n is
! the number of items determined by the arguments to tp_alloc.
!
! - Things are further complicated by the garbage collection API.
! This affects tp_basicsize, and the actions to be taken by
! tp_alloc. tp_alloc should look at the Py_TPFLAGS_GC flag bit in
! the tp_flags field of the type passed in, and not assume that
! this is the same as the corresponding bit in the base type. (In
! part, the GC API is at fault; Neil Schemenauer has a patch that
! fixes the API, but it is currently backwards incompatible.)
!
! Note: the rules here are very complicated -- probably too
! complicated. It may be better to give up on subtyping immutable
! types, types with custom allocators, and types with variable size
! allocation (such as int, string and tuple) -- then the rules can
! be much simplified because you can assume allocation on the
! standard heap, no requirement beyond zeroing memory in tp_alloc,
! and no variable length allocation.
!
+ Creating a subtype of a built-in type in C
+
The derived type begins by declaring a type structure which
contains the base type's structure. For example, here's the type
***************
*** 400,403 ****
--- 536,586 ----
This document has been placed in the public domain.
+
+
+ Junk text (to be reused somewhere above)
+
+ The deallocation mechanism chosen should match the allocation
+ mechanism: an allocation policy should prescribe both the
+ allocation and deallocation mechanism. And again, planning ahead
+ for subtyping would be nice. But the available mechanisms are
+ different. The deallocation function has always been part of the
+ type structure, as tp_dealloc, which combines the
+ "uninitialization" with deallocation. This was good enough for
+ the traditional situation, where it matched the combined
+ allocation and initialization of the creation function. But now
+ imagine a type whose creation function uses a special free list
+ for allocation. It's deallocation function puts the object's
+ memory back on the same free list. But when allocation and
+ creation are separate, the object may have been allocated from the
+ regular heap, and it would be wrong (in some cases disastrous) if
+ it were placed on the free list by the deallocation function.
+
+ A solution would be for the tp_construct function to somehow mark
+ whether the object was allocated from the special free list, so
+ that the tp_dealloc function can choose the right deallocation
+ method (assuming that the only two alternatives are a special free
+ list or the regular heap). A variant that doesn't require space
+ for an allocation flag bit would be to have two type objects,
+ identical in the contents of all their slots except for their
+ deallocation slot. But this requires that all type-checking code
+ (e.g. the PyDict_Check()) recognizes both types. We'll come back
+ to this solution in the context of subtyping. Another alternative
+ is to require the metatype's tp_call to leave the allocation to
+ the tp_construct method, by passing in a NULL pointer. But this
+ doesn't work once we allow subtyping.
+
+ Eventually, when we add any form of subtyping, we'll have to
+ separate deallocation from uninitialization. The way to do this
+ is to add a separate slot to the type object that does the
+ uninitialization without the deallocation. Fortunately, there is
+ already such a slot: tp_clear, currently used by the garbage
+ collection subsystem. A simple rule makes this slot reusable as
+ an uninitialization: for types that support separate allocation
+ and initialization, tp_clear must be defined (even if the object
+ doesn't support garbage collection) and it must DECREF all
+ contained objects and FREE all other memory areas the object owns.
+ It must also be reentrant: it must be possible to clear an already
+ cleared object. The easiest way to do this is to replace all
+ pointers DECREFed or FREEd with NULL pointers.