
For the benefit of those who may be unfamiliar with ways to add new functionality I will try to briefly summarize. More information can be found in the documentation and in the books that have been written about Python. There are two (three) ways to add a new object to Python: using an extension type and defining a class. The fact that there are two distinct ways to add new objects is often called the type-class dichotomy. It is a goal of Py3K to somehow eliminate this distinction. Another way to add new behavior that I'll explain is to make the type an "extension class." Making the type a "subtype" of this fancy type gives a possible direction for unifying types and classes. Types =============================== "Types" are more fundamental to the language and must be added using compiled code (All of the types I've seen are in straight C since you don't really buy anything by using C++ as Python itself is written in C). You can investigate the type of an object from within python by using the command type:
type(a) # prints the "type" of object a
There are many types defined in the Python core such as integers, floats, complex, lists, tuples, dictionaries, etc. Python allows you to make new types. These must be made in C (maybe C++ but again I don't think the extra complexity buys you anything since Python is in C.) A new type is a PyTypeObject basically filled with function pointers and arrays of function pointers to handle the various operations one might do on the new type. This PyTypeObject is coupled with a C-structure containing the "data" for the new type. This data C structure lists PyObject_HEAD as it's first member and then whatever other data is necessary. Making a new type is thus a matter of creating these two C structures and filling in the TypeObject table with function pointers to handle various operations (getting and setting attributes, treating the type as an abstract number, sequence, or mapping, or printing the object). Python has an abstract object interface on the C level, that is used, so that if a type that has a "number" interface (operations) it can be used like a number, if it has a "sequence" interface can be indexed like a sequence, or if it has a "mapping" interface it can be indexed like a dictionary. Classes ====================== A Python Class is at the C level just another "type." There are actually two "types" associated with a Python class: an instance type and a class type. An instance of a class is the instance type. So every instance of any class has the same "type." What this means on a C level is that there is one more layer of indirection for each "operation" in Python when the type is "class". The Python interpreter goes through the "class type" to see what to do and finds the appropriate C function from that PyTypeObject Method table. This C function does a dictionary lookup using the special method names and executes the Python function associated with that name for the particular instance (which may call back into a compiled extension module to do the actual work). This level of indirection gives a great deal of dynamic flexibility since classes can be subclassed and attributes can be added dynamically, but there will be a performance hit which won't be noticeable except inside Python iteration loops. So in reality there is no "type"-"class" dichotomy. Everything is a type. It's just that classes are dynamic types which allow you to define Python functions to implement the "method table" The reason for the dichotomy is that classes are so useful, that people really like them, and use them quite a bit so that the other static types seem quite rigid in comparison. Extension Classes ================================== This is another fancy, dynamic "type" not distributed in the Python Core but developed by Digital Creations (the Zope people) in order to let C programmers "subclass" types. I'm not an expert on these as I've never really used them but as far as I can tell they bring the idea of "dynamic types" to the C programmer. This is accomplished by making all types just subtypes of the extension class "type". One way to understand the result is by understanding what the type command tells you about your new "extension class". It will tell you that's it's of type "extension class." So, dynamic typing is again implemented with another layer of indirection where the fixed special C functions of the extension class "type" call out to your particular set of registered C functions. The difference is that the indirection is all handled in C. So those are the choices for implementing new behavior in Python. Currently, Numerical Python is implemented as a new "type" which defines all of these interfaces. The mapping interface handles "extended slicing," the "sequence" interface allows the array to return something when len() is called for example, and the "number" interface implements the operators. Actually, two new types are defined: a "ufunc" type and an "array" type. All of the operators are implemented as instances of the "ufunc" type. The "ufunc" essentially encapsulates the "casting and broadcasting" rules associated with elementwise operations. The ufunc is not well-understood by most non-developers I've talked too since most people don't instantiate their own ufuncs (which must be instantiated in C). The code works and is fast, but it can be hard to extend and there are pieces that are poorly documented and hard to understand. For example, nobody has reworked the "extended slicing" syntax to enable arbitrary-index slicing, despite many people who would like that feature (actually I've heard that John Bernard did finally write some code to do that but I've never seen it and it's not there now). As mentioned before, David Ascher made the necessary changes to make Numerical Python of type "extension class" which among other things, allowed, the type to be "subclassed" from within Python. I thought this was a nice solution and we'd have to hear from him as to what went wrong. The only trouble I had with it is that the C-API changed slightly in that Arrays were no longer of type Array_Type and code that depended on it would break (the same is true of any redesign making Python arrays a class). We'd have to hear from him as to what other problems he saw. It still doesn't solve the problem of maintainability of the C-code base, but it definitely gave a more flexible result to the Python user. Perhaps retrofitting the ExtensionClass solution with an enhanced C-API would be a better solution. We really need David's input on that suggestion... The idea I've put forward is to make the object "classes" but I would support the "extension class" solution as well. Regardless of how it is implemented, we still need to design the appropriate "objects" (arraytype, NDArray, Ufunc) and how they interact with each other, as well as a suitable C-API so that they work together seemlessly. I hope this helps some readers who are less familiar with extending Python. DISCLAIMER: I am not the world's expert on these issues but I do have some experience, so take what lessons you may. Best wishes, Travis Oliphant