[Numpy-discussion] Explanation of some terms

Wed Aug 23 11:38:26 EDT 2000

For the benefit of those who may be unfamiliar with ways to add new
functionality I will try to briefly summarize.  More information can be
found in the documentation and in the books that have been written about
Python.

There are two (three) ways to add a new object to Python:  using an
extension type and defining a class.  The fact that there are two
distinct ways to add new objects is often called the type-class dichotomy.
It is a goal of Py3K to somehow eliminate this distinction.  Another way
to add new behavior that I'll explain is to make the type an "extension
class."  Making the type a "subtype" of this fancy type gives a possible
direction for unifying types and classes.

Types
===============================

"Types" are more fundamental to the language and must be added using
compiled code (All of the types I've seen are in straight C since you
don't really buy anything by using C++ as Python itself is written in C).
You can investigate the type of an object from within python by using the
command type:

>>> type(a)  # prints the "type" of object a

There are many types defined in the Python core such as integers, floats, 
complex, lists, tuples, dictionaries, etc.

Python allows you to make new types.  These must be made in C (maybe C++
but again I don't think the extra complexity buys you anything since
Python is in C.)  A new type is a PyTypeObject basically filled with
function pointers and arrays of function pointers to handle the various
operations one might do on the new type.   This PyTypeObject is coupled
with a C-structure containing the "data" for the new type.  This data C
structure lists PyObject_HEAD as it's first member and then whatever other
data is necessary.  Making a new type is thus a matter of creating these
two C structures and filling in the TypeObject table with function
pointers to handle various operations (getting and setting attributes,
treating the type as an abstract number, sequence, or mapping, or printing
the object). 

Python has an abstract object interface on the C level, that is used, so
that if a type that has a "number" interface (operations) it can be used
like a number, if it has a "sequence" interface can be indexed like a
sequence, or if it has a "mapping" interface it can be indexed like a
dictionary.  

Classes
======================

A Python Class is at the C level just another "type."  There are actually
two "types" associated with a Python class:  an instance type and a class
type.  An instance of a class is the instance type.  So every instance of
any class has the same "type."  

What this means on a C level is that there is one more layer of
indirection for each "operation" in Python when the type is "class".  The
Python interpreter goes through the "class type" to see what to do and
finds the appropriate C function from that PyTypeObject Method table.
This C function does a dictionary lookup using the special method names
and executes the Python function associated with that name for the  
particular instance (which may call back into a compiled extension module
to do the actual work).   This level of indirection gives a great deal of
dynamic flexibility since classes can be subclassed and attributes can be
added dynamically, but there will be a performance hit which won't be
noticeable except inside Python iteration loops. 

So in reality there is no "type"-"class" dichotomy.  Everything is a
type.  It's just that classes are dynamic types which allow you to define
Python functions to implement the "method table"

The reason for the dichotomy is that classes are so useful, that people
really like them, and use them quite a bit so that the other static types
seem quite rigid in comparison.

Extension Classes
==================================

This is another fancy, dynamic "type" not distributed in the Python Core
but developed by Digital Creations (the Zope people) in order to let C
programmers "subclass" types.  I'm not an expert on these as I've never
really used them but as far as I can tell they bring the idea of
"dynamic types" to the C programmer.   This is accomplished by making all
types just subtypes of the extension class "type".  One way to
understand the result is by understanding what the type command
tells you about your new "extension class".  It will tell you
that's it's of type "extension class."  So, dynamic typing is again 
implemented with another layer of indirection where the fixed special C
functions of the extension class "type" call out to your particular set of 
registered C functions.  The difference is that the indirection is
all handled in C.  

So those are the choices for implementing new behavior in Python.  

Currently, Numerical Python is implemented as a new "type" which defines
all of these interfaces.  The mapping interface handles "extended
slicing," the "sequence" interface allows the array to return
something when len() is called for example, and the "number" interface
implements the operators. 

Actually, two new types are defined:  a "ufunc" type and an "array" type.
All of the operators are implemented as instances of the "ufunc" type.
The "ufunc" essentially encapsulates the "casting and broadcasting" rules
associated with elementwise operations.   The ufunc is not well-understood
by most non-developers I've talked too since most people don't instantiate
their own ufuncs (which must be instantiated in C).

The code works and is fast, but it can be hard to extend and there are
pieces that are poorly documented and hard to understand.   For example,
nobody has reworked the "extended slicing" syntax to enable
arbitrary-index slicing, despite many people who would like that feature
(actually I've heard that John Bernard did finally write some code to
do that but I've never seen it and it's not there now).

As mentioned before, David Ascher made the necessary changes to make
Numerical Python of type "extension class" which among other things,
allowed, the type to be "subclassed" from within Python.  I thought this
was a nice solution and we'd have to hear from him as to what went wrong.
The only trouble I had with it is that the C-API changed slightly in that
Arrays were no longer of type Array_Type and code that depended on it
would break (the same is true of any redesign making Python arrays a
class).  We'd have to hear from him as to what other problems he
saw.  It still doesn't solve the problem of maintainability of the C-code
base, but it definitely gave a more flexible result to the Python user.  

Perhaps retrofitting the ExtensionClass solution with an enhanced C-API
would be a better solution.  We really need David's input on that
suggestion...

The idea I've put forward is to make the object "classes" but I would
support the "extension class" solution as well.  Regardless of how it is
implemented, we still need to design the appropriate "objects"
(arraytype, NDArray, Ufunc) and how they interact with each other, as well
as a suitable C-API so that they work together seemlessly.  

I hope this helps some readers who are less familiar with extending
Python.  DISCLAIMER:  I am not the world's expert on these issues but I do
have some experience, so take what lessons you may.

Best wishes,

Travis Oliphant