
Greetings to all interested in Numerical Python, My purpose in writing this somewhat long post is to inform interested parties as to where NumPy is going and how far it has gone. I'm doing this in order to coordinate interest and try to summarize some of the recent conversations I've had with other interested people. There are a significant handful of people who are very interested in where Numerical Python is going. All of these people are very bright and have distinct desires for the future of Numerical Python which come from quite diverse experience. This intelligence and diversity brings tremendous strength (both current and potential) to the community and has made Numerical Python an extremely useful tool. Of course, these benefits do not come cheaply: there is quite a bit of disagreement about how things should be done --- mostly due to the fact that people use Numerical Python for different things. Fortunately, this disagreement is not insurmountable provided people are willing to compromise a little syntatic sugar here and there. Numerical Python users have been enjoying the flexibility and power of the underlying programming language for several years. The price we must pay for using a language that is not wholly dedicated to Numerical pursuits, is that we must cooperate with other users of the core language who have interests entirely different than our own. Since Numerical programming is rarely "strictly numerical," what we gain is access to the work they do in improving Python's stock of library tools. When I was introduced to Numerical Python system, some of the results of this compromise were a little annoying to me --- somewhat like the whitespace rule. What I found, however, was that my annoyance gave way to elation as I realized that the non-numeric objects and toolkits where extremely beneficial to me in my numeric work: regular expressions, serving graphs from a website, writing translators for various files and formats, etc. With that introduction, I'll give a brief history of Numerical Python (please forgive me if I have neglected important contributors). Numerical Python started from the work of Jim Hugunin (which he used as part of his Oral Examination at MIT). He posted an announcement of his proposal in August of 1995 based on the Matrix Object previsouly presented by Jim Fulton. Early discussions of the work can be found at http://www.python.org/pipermail/matrix-sig/ which presents very interested reading since many of the topics peole still talk about were hashed even back then. Konrad Hinsen, Paul Dubois, David Ascher, and Jim Fulton were all early contributors. Jim Fulton's work and connections to Guido Van Rossum enabled many of the early changes (extended slicing, complex numbers, ellipses) to get into Python itself. Guido was also part of the early discussion. Konrad Hinsen contributed a significant amount of code to the current version of Numeric Python as well. Jim Hugunin released version 0.2 in December of 1995 and followed the release early, release often model for several months to get Numerical Python into a working state. It is obvious that he spent many hours writing code (time which NumPy2 contributors have not been able to duplicate). One thing that led to some stall in Numerical Python's development is that Jim Hugunin left the project to concentrate on JPython. Paul Dubois picked up the task of project administrator and has done an admirable job, including securing resources to get the current documentation written. David Ascher wrote the bulk of that important resource. Personally, I started using Numerical Python after scouring the Net for something to replace MATLAB for me which had become burdensome under the weight of large data volumes and inefficient memory handling. I started using Numerical Python in the Spring of 1998 ( a relative late-comer ) but I have used it actively ever since. I started releasing packages at that time to increase the number of toolboxes available to the Numerical Python programmer as I was quite happy with the language itself (after I got over the initial annoyances). I've released many pieces of code since then which I personally use quite regularly. Most of these can be found at http://oliphant.netpedia.net Naturally my contributions have been in areas where I had a personal need, but they have enabled me to understand the Numerical Python source code enough to feel confident in modifying it. With that bit of history let's get into why NumPy2: Guido Van Rossum has expressed willingness to include multidimensional arrays into the Python core. The source of this willingness appears to be a general respect for the community of users who use Python for Numerical programming (although he himself is not one of those users). There is already a useful one-dimensional array object distributed with Python which, however, does not support any operations. Some of it's features where borrowed for the current Numerical Python. Last year, I suggested that the PIL and Numeric Python work more closely together (since an image is conceptually just a 2-D (or 3-D for color) Numeric Python array). /F from pythonware responded by saying that until Numeric Python was a part of Python itself he saw no reason to modify the PIL. I took the bait and after pondering why 4 years had elapsed without Numerical Python getting into Python itself, I contacted Guido and Paul to start the ball rolling. Guido's response was that those familiar with the code said it was too ugly and unwieldy to put into Python. The code is just too hard to modify and understand. Evidently, since there are only a handful of people of the hundreds that use it who submit bug patches, or feature enhancements, this must be true. Those who do understand how it works have a hard time finding time to make needed changes --- the intrinsic cooperation problem with volunteer time that is not funded (or contributed to) by those who make use of the results. Guido was kind enough to provide me with some design documents for an implementation of multidimensional arrays that he had worked out. Thinking I would be in graduate school for longer than I am going to be, I set about trying to clean up Numerical Python with the intent of getting it into the Python core. As part of this effort I conducted a survey of current Numerical Python users to find out their interests. The survey and it's results are available at the sourceforge site for Numerical Python. Basically the results indicate that most people agree on some important features (like arbitrary indexing into arrays), but disagree on some details (copy vs. reference and automatic casting rules being the most memorable). While the results were useful, a simple comment made by one of the survey participants made a significant impression on me: "the C-code is too inflexible and hard to change." This is essentially the problem that Paul Dubois had identified and which was keeping Numerical Python out of the Python core. At the same time I had been doing some work with implementing a sparse matrix package for Python by wrapping some compiled C and Fortran code into a Python class I'd constructed. The results were very encouraging and made me realize that the same technique could be used to make Numerical Python much more flexible and easier to extend while retaining it's significant speed benefits. I decided to make a new implementation of Numerical Python where the underlying objects (the array and ufunc objects) are not extension types but true Python classes. This would allow significant benefits in terms of flexibility and modifiability with a small memory-overhead loss and an indeterminate speed change (it will likely be faster under some usages and slightly slower in others). I also wanted to add more types (unsigned types, boolean, and potentially others). While making this change, I realized that another way out of the "type-class" dichotomy (along with ExtensionClasses) is to not make new types at all. If all types were really ExtensionClasses and all new types had to be as well, this could effectively solve the problem from the Python user perspective as well. An noble effort at making Numerical Python an Extension Class was undertaken by David Ascher last year. His work became the ill-fated Numerical Python 12. I rather liked his work, but there were some very hard to trace bugs in the implementation, and the C-code was still hard to modify. Another problem (that must be dealt with with the new implementation as well) is the significant amount of code that has been written to the old C-API. This finally brings us to the state of Numerical Python. I've been working on this implementation on and off for six months (mostly off), but have worked out many of the design details. Since my time is currently limited for the next 3 months, I wanted to let others know of the status to encourage involvement. We have a window here to get this next version of Numeric into Python 2.1, but the window will probably close sometime in January, so there is some urgency. In the next installment, I will outline the design of Numerical Python 2 and some of it's goals. -Travis Oliphant

As a followup to my previous post here is a discussion and overview of the current NumPy2 design as I have it in my head and partially implemented in the numpy2 module on the CVS tree at numpy's sourceforge site. The design of NumPy2 is quite simple and tries to balance speed with flexibility and modifiability. An outline of the design follows (the names can change, they are only reference at the moment) Three classes replacing the current C structures: ArrayType --- replaces PyArray_Descr # not implemented yet Ufunc --- replaces PyUfuncObject NDArray --- replaces PyArrayObject The purpose of each class is to encapsulate interfaces to allow code re-use for similar operations. NDArray # This is implemented except for the # operations (ufuncs) ======================== The most concrete class is the NDArray class --- it just needs coding to make it happen. The other classes still need some design work to efficiently handle mixed type operations and additions of new types to the system. The NDArray base class gives an N-Dimensional array interpretation to a Python buffer (a segment of memory, an m-mapped file, a PIL image, etc.). It provides this interpretation with three special attributes: self.rank --- the dimension of the array (hard-coded changeable limit of 10). self._data --- a buffer object pointing to the data self._structure --- a buffer object pointing to an array of INTEGERS which holds the dimensions and strides information (INTEGER) is a platform-dependent type #defined in compiled code self._descr --- Python class describing the type. To interact seamlessly with the C-API and be recognized as "an array" all subclasses must either export an __array__ method which creates a suitable NDArray or not interfere with these provided attributes. Note that the same data segment can be viewed in several different ways. The NDArray will have default implementations for the numeric operations that will resemble the current implementation. But, it will be easy to subclass the array to handle these operations as you'd like without losing the ability to use the data in that array in extension modules which assume array inputs. Two other attributes are worth mentioning: self.CONTIGUOUS # this can be determined from the _structure information # but it is useful to keep a flag around indicating the # status. This tells you whether or not you can # walk through the entire array an element at a # time with a single for loop. self.FORTRANVIEW # This basically indicates how the array will view it's # shape when asked and indexed (This does not change the # _structure information). An array of "shape" # (10,3,5,7) when # FORTRANVIEW is 0 will be an array of "shape" (7,5,3,10) # when FORTRANVIEW is 1 ArrayType: # This is not implemented yet. =============================== This class is to replace the PyArray_Descr structure in current Numerical Python. As a result, it must contain the information: self.name ---- some kind of object to identify it (a string) self.elsize ---- size of an item of this kind. self.cast ---- a dictionary of compiled functions with at least one entry called to cast this type to at least one other type self.getitem ---- (Compiled) function self.setitem ---- (Compiled) function self.zeros ---- needed for the zeros command to include arrays of Python Objects. It points to the representation of zero for this type. Currently, the above is not implemented, yet. What is implemented is a module _arraytypes which exposes to Python the PyArray_Descr structure so that it can be used. The idea of adding new types to Numerical Python without having to change all of the code is appealing to me, however. Ufunc: # This is partly implemented ========================================== There are two ideas here. I've partly implemented the first one which I'll explain. The second was presented to me by Paul Barrett. Ufunc's are encapsulations of the N-D looping construct and the broadcasting rules of Numerical Python. The N-D looping construct is limited to the fixed but arbitrary 10 dimensions as given above for C code but can be arbitrary if a Python function is called at each iteration. I explain Ufuncs in a piece called Ufuncexplain.txt which is on the CVS tree. Here is a quote that explains broadcasting rules: 1) If input arrays do not have the same rank. The array with lower rank will be prepended with ones until ranks agree. 2) If an input array has length one, then "duplicate" the elements along that dimension so that input shapes agree. Example: A is an array of shape (10,) B is an array of shape (3,10) A * B will return an array of shape (3,10): - A is interpreted as shape (1,10) - the columns of A are "broadcast" across the rows of B Thus the output is (3,10): [ A*B[0] A*B[1] A*B[2] ] Note that A is not actually extended to a (3,10) array. It merely behaves as if it had been. The element-wise math operators are implemented using Ufuncs. SpecialFuncs is a Python package at http://oliphant.netpedia.net that impelements a whole range of special functions using the Ufunc formalism. I also include in that package a general arraymap function which can turn any Python function into a broadcasting "ufunc-mimic" This code does not have the rank 10 limit on the number of dimensions on the inputs -- but it might be slower than the current implementation. My current implementation assumes that the Ufunc instantiator will provide two functions: a select function and a compute function (either of these can be in C or Python). The select function and the compute function work together. The select function determines the type of the outputs based on the input types, while the compute function takes the inputs and outputs (and their types) and computes the ouptut. This is done on either an entire block of memory (optimized ufuncs) or one-element-at-a-time (unoptimized ufuncs---Python coded ufuncs for example). This allows for efficient coding and the possibility of mixed type arithmetic with a more complicated creation process. It also may be hard to add new types and have them function as you'd like without modifying others already-defined ufuncs. But, I know this idea will work and I can see my way through it. I've already implemented an "addition" function using this method. Another idea that has been presented is to instantiate a Ufunc with only one function that is entered into a "dispatch table" or dictionary of functions keyed by the ArrayType class much like the Multimethod approach that has been discussed. I like this idea, but I do not see the details (I haven't thought about it too much) and do not know if we can actually make it work --- Frankly, I think we can and it will result in a better system. What hasn't been thought through is exactly what is entered into the "function" dictionary and when is it called. Some have suggested that it is called "immediately" upon ufunc call, but this would eliminate the benefits of the encapsulated broadcasting rules. An alternative would be to call it after the "broadcasting rule encapsulation" as been done. In other words use it as a replacement for the current array of functions in the Ufunc implementation. With the appropriate mix of C-modules and Python code I think this could be done quite elegantly. The other issue that has to be worked out (again) is that obviously this table will not be filled out in every case (for a function with 10 inputs and 10 outputs with 16 types we are talking 16^20 different entries in the table and will be sparse) so what is done when there is not an entry for a particular combination will require some thought. I think we could allow multiple behaviors according to some attribute of the ufunc (casting, exception raising, etc.,) is set. Many people might fear this would inhibit code re-use but I have not seen convincing examples. So, that is a brief overview of the state of things. It doesn't try to cover everything, but it should give you enough of a perspective to understand the code that is on the CVS tree under module numpy2. I have been using C for the compiled code because it is easy to interface to and it has the widest platform support and because Python istelf is written in C. Anybody with specific questions (including offers to help) can feel free to contact me or post to this list. Thanks, Travis Oliphant

quick easy question here. it seems like the current development efforts are going into numpy2. i can see why this is very important to try to make the python2.1 cutoff date in january. in the mean time, is there a planned release for the current numpy that will take advantage of python 2.0 features? (i'm mainly looking at the assignment operators, array1 += array2) i would assume there is, but i haven't heard any talk going on for this. the python 2.0 beta is coming up in one week (pending further delays). is there a timeframe for a new numpy that is ready to take advantage of 2.0 ?

For the benefit of those who may be unfamiliar with ways to add new functionality I will try to briefly summarize. More information can be found in the documentation and in the books that have been written about Python. There are two (three) ways to add a new object to Python: using an extension type and defining a class. The fact that there are two distinct ways to add new objects is often called the type-class dichotomy. It is a goal of Py3K to somehow eliminate this distinction. Another way to add new behavior that I'll explain is to make the type an "extension class." Making the type a "subtype" of this fancy type gives a possible direction for unifying types and classes. Types =============================== "Types" are more fundamental to the language and must be added using compiled code (All of the types I've seen are in straight C since you don't really buy anything by using C++ as Python itself is written in C). You can investigate the type of an object from within python by using the command type:
type(a) # prints the "type" of object a
There are many types defined in the Python core such as integers, floats, complex, lists, tuples, dictionaries, etc. Python allows you to make new types. These must be made in C (maybe C++ but again I don't think the extra complexity buys you anything since Python is in C.) A new type is a PyTypeObject basically filled with function pointers and arrays of function pointers to handle the various operations one might do on the new type. This PyTypeObject is coupled with a C-structure containing the "data" for the new type. This data C structure lists PyObject_HEAD as it's first member and then whatever other data is necessary. Making a new type is thus a matter of creating these two C structures and filling in the TypeObject table with function pointers to handle various operations (getting and setting attributes, treating the type as an abstract number, sequence, or mapping, or printing the object). Python has an abstract object interface on the C level, that is used, so that if a type that has a "number" interface (operations) it can be used like a number, if it has a "sequence" interface can be indexed like a sequence, or if it has a "mapping" interface it can be indexed like a dictionary. Classes ====================== A Python Class is at the C level just another "type." There are actually two "types" associated with a Python class: an instance type and a class type. An instance of a class is the instance type. So every instance of any class has the same "type." What this means on a C level is that there is one more layer of indirection for each "operation" in Python when the type is "class". The Python interpreter goes through the "class type" to see what to do and finds the appropriate C function from that PyTypeObject Method table. This C function does a dictionary lookup using the special method names and executes the Python function associated with that name for the particular instance (which may call back into a compiled extension module to do the actual work). This level of indirection gives a great deal of dynamic flexibility since classes can be subclassed and attributes can be added dynamically, but there will be a performance hit which won't be noticeable except inside Python iteration loops. So in reality there is no "type"-"class" dichotomy. Everything is a type. It's just that classes are dynamic types which allow you to define Python functions to implement the "method table" The reason for the dichotomy is that classes are so useful, that people really like them, and use them quite a bit so that the other static types seem quite rigid in comparison. Extension Classes ================================== This is another fancy, dynamic "type" not distributed in the Python Core but developed by Digital Creations (the Zope people) in order to let C programmers "subclass" types. I'm not an expert on these as I've never really used them but as far as I can tell they bring the idea of "dynamic types" to the C programmer. This is accomplished by making all types just subtypes of the extension class "type". One way to understand the result is by understanding what the type command tells you about your new "extension class". It will tell you that's it's of type "extension class." So, dynamic typing is again implemented with another layer of indirection where the fixed special C functions of the extension class "type" call out to your particular set of registered C functions. The difference is that the indirection is all handled in C. So those are the choices for implementing new behavior in Python. Currently, Numerical Python is implemented as a new "type" which defines all of these interfaces. The mapping interface handles "extended slicing," the "sequence" interface allows the array to return something when len() is called for example, and the "number" interface implements the operators. Actually, two new types are defined: a "ufunc" type and an "array" type. All of the operators are implemented as instances of the "ufunc" type. The "ufunc" essentially encapsulates the "casting and broadcasting" rules associated with elementwise operations. The ufunc is not well-understood by most non-developers I've talked too since most people don't instantiate their own ufuncs (which must be instantiated in C). The code works and is fast, but it can be hard to extend and there are pieces that are poorly documented and hard to understand. For example, nobody has reworked the "extended slicing" syntax to enable arbitrary-index slicing, despite many people who would like that feature (actually I've heard that John Bernard did finally write some code to do that but I've never seen it and it's not there now). As mentioned before, David Ascher made the necessary changes to make Numerical Python of type "extension class" which among other things, allowed, the type to be "subclassed" from within Python. I thought this was a nice solution and we'd have to hear from him as to what went wrong. The only trouble I had with it is that the C-API changed slightly in that Arrays were no longer of type Array_Type and code that depended on it would break (the same is true of any redesign making Python arrays a class). We'd have to hear from him as to what other problems he saw. It still doesn't solve the problem of maintainability of the C-code base, but it definitely gave a more flexible result to the Python user. Perhaps retrofitting the ExtensionClass solution with an enhanced C-API would be a better solution. We really need David's input on that suggestion... The idea I've put forward is to make the object "classes" but I would support the "extension class" solution as well. Regardless of how it is implemented, we still need to design the appropriate "objects" (arraytype, NDArray, Ufunc) and how they interact with each other, as well as a suitable C-API so that they work together seemlessly. I hope this helps some readers who are less familiar with extending Python. DISCLAIMER: I am not the world's expert on these issues but I do have some experience, so take what lessons you may. Best wishes, Travis Oliphant

As a followup to my previous post here is a discussion and overview of the current NumPy2 design as I have it in my head and partially implemented in the numpy2 module on the CVS tree at numpy's sourceforge site. The design of NumPy2 is quite simple and tries to balance speed with flexibility and modifiability. An outline of the design follows (the names can change, they are only reference at the moment) Three classes replacing the current C structures: ArrayType --- replaces PyArray_Descr # not implemented yet Ufunc --- replaces PyUfuncObject NDArray --- replaces PyArrayObject The purpose of each class is to encapsulate interfaces to allow code re-use for similar operations. NDArray # This is implemented except for the # operations (ufuncs) ======================== The most concrete class is the NDArray class --- it just needs coding to make it happen. The other classes still need some design work to efficiently handle mixed type operations and additions of new types to the system. The NDArray base class gives an N-Dimensional array interpretation to a Python buffer (a segment of memory, an m-mapped file, a PIL image, etc.). It provides this interpretation with three special attributes: self.rank --- the dimension of the array (hard-coded changeable limit of 10). self._data --- a buffer object pointing to the data self._structure --- a buffer object pointing to an array of INTEGERS which holds the dimensions and strides information (INTEGER) is a platform-dependent type #defined in compiled code self._descr --- Python class describing the type. To interact seamlessly with the C-API and be recognized as "an array" all subclasses must either export an __array__ method which creates a suitable NDArray or not interfere with these provided attributes. Note that the same data segment can be viewed in several different ways. The NDArray will have default implementations for the numeric operations that will resemble the current implementation. But, it will be easy to subclass the array to handle these operations as you'd like without losing the ability to use the data in that array in extension modules which assume array inputs. Two other attributes are worth mentioning: self.CONTIGUOUS # this can be determined from the _structure information # but it is useful to keep a flag around indicating the # status. This tells you whether or not you can # walk through the entire array an element at a # time with a single for loop. self.FORTRANVIEW # This basically indicates how the array will view it's # shape when asked and indexed (This does not change the # _structure information). An array of "shape" # (10,3,5,7) when # FORTRANVIEW is 0 will be an array of "shape" (7,5,3,10) # when FORTRANVIEW is 1 ArrayType: # This is not implemented yet. =============================== This class is to replace the PyArray_Descr structure in current Numerical Python. As a result, it must contain the information: self.name ---- some kind of object to identify it (a string) self.elsize ---- size of an item of this kind. self.cast ---- a dictionary of compiled functions with at least one entry called to cast this type to at least one other type self.getitem ---- (Compiled) function self.setitem ---- (Compiled) function self.zeros ---- needed for the zeros command to include arrays of Python Objects. It points to the representation of zero for this type. Currently, the above is not implemented, yet. What is implemented is a module _arraytypes which exposes to Python the PyArray_Descr structure so that it can be used. The idea of adding new types to Numerical Python without having to change all of the code is appealing to me, however. Ufunc: # This is partly implemented ========================================== There are two ideas here. I've partly implemented the first one which I'll explain. The second was presented to me by Paul Barrett. Ufunc's are encapsulations of the N-D looping construct and the broadcasting rules of Numerical Python. The N-D looping construct is limited to the fixed but arbitrary 10 dimensions as given above for C code but can be arbitrary if a Python function is called at each iteration. I explain Ufuncs in a piece called Ufuncexplain.txt which is on the CVS tree. Here is a quote that explains broadcasting rules: 1) If input arrays do not have the same rank. The array with lower rank will be prepended with ones until ranks agree. 2) If an input array has length one, then "duplicate" the elements along that dimension so that input shapes agree. Example: A is an array of shape (10,) B is an array of shape (3,10) A * B will return an array of shape (3,10): - A is interpreted as shape (1,10) - the columns of A are "broadcast" across the rows of B Thus the output is (3,10): [ A*B[0] A*B[1] A*B[2] ] Note that A is not actually extended to a (3,10) array. It merely behaves as if it had been. The element-wise math operators are implemented using Ufuncs. SpecialFuncs is a Python package at http://oliphant.netpedia.net that impelements a whole range of special functions using the Ufunc formalism. I also include in that package a general arraymap function which can turn any Python function into a broadcasting "ufunc-mimic" This code does not have the rank 10 limit on the number of dimensions on the inputs -- but it might be slower than the current implementation. My current implementation assumes that the Ufunc instantiator will provide two functions: a select function and a compute function (either of these can be in C or Python). The select function and the compute function work together. The select function determines the type of the outputs based on the input types, while the compute function takes the inputs and outputs (and their types) and computes the ouptut. This is done on either an entire block of memory (optimized ufuncs) or one-element-at-a-time (unoptimized ufuncs---Python coded ufuncs for example). This allows for efficient coding and the possibility of mixed type arithmetic with a more complicated creation process. It also may be hard to add new types and have them function as you'd like without modifying others already-defined ufuncs. But, I know this idea will work and I can see my way through it. I've already implemented an "addition" function using this method. Another idea that has been presented is to instantiate a Ufunc with only one function that is entered into a "dispatch table" or dictionary of functions keyed by the ArrayType class much like the Multimethod approach that has been discussed. I like this idea, but I do not see the details (I haven't thought about it too much) and do not know if we can actually make it work --- Frankly, I think we can and it will result in a better system. What hasn't been thought through is exactly what is entered into the "function" dictionary and when is it called. Some have suggested that it is called "immediately" upon ufunc call, but this would eliminate the benefits of the encapsulated broadcasting rules. An alternative would be to call it after the "broadcasting rule encapsulation" as been done. In other words use it as a replacement for the current array of functions in the Ufunc implementation. With the appropriate mix of C-modules and Python code I think this could be done quite elegantly. The other issue that has to be worked out (again) is that obviously this table will not be filled out in every case (for a function with 10 inputs and 10 outputs with 16 types we are talking 16^20 different entries in the table and will be sparse) so what is done when there is not an entry for a particular combination will require some thought. I think we could allow multiple behaviors according to some attribute of the ufunc (casting, exception raising, etc.,) is set. Many people might fear this would inhibit code re-use but I have not seen convincing examples. So, that is a brief overview of the state of things. It doesn't try to cover everything, but it should give you enough of a perspective to understand the code that is on the CVS tree under module numpy2. I have been using C for the compiled code because it is easy to interface to and it has the widest platform support and because Python istelf is written in C. Anybody with specific questions (including offers to help) can feel free to contact me or post to this list. Thanks, Travis Oliphant

quick easy question here. it seems like the current development efforts are going into numpy2. i can see why this is very important to try to make the python2.1 cutoff date in january. in the mean time, is there a planned release for the current numpy that will take advantage of python 2.0 features? (i'm mainly looking at the assignment operators, array1 += array2) i would assume there is, but i haven't heard any talk going on for this. the python 2.0 beta is coming up in one week (pending further delays). is there a timeframe for a new numpy that is ready to take advantage of 2.0 ?

For the benefit of those who may be unfamiliar with ways to add new functionality I will try to briefly summarize. More information can be found in the documentation and in the books that have been written about Python. There are two (three) ways to add a new object to Python: using an extension type and defining a class. The fact that there are two distinct ways to add new objects is often called the type-class dichotomy. It is a goal of Py3K to somehow eliminate this distinction. Another way to add new behavior that I'll explain is to make the type an "extension class." Making the type a "subtype" of this fancy type gives a possible direction for unifying types and classes. Types =============================== "Types" are more fundamental to the language and must be added using compiled code (All of the types I've seen are in straight C since you don't really buy anything by using C++ as Python itself is written in C). You can investigate the type of an object from within python by using the command type:
type(a) # prints the "type" of object a
There are many types defined in the Python core such as integers, floats, complex, lists, tuples, dictionaries, etc. Python allows you to make new types. These must be made in C (maybe C++ but again I don't think the extra complexity buys you anything since Python is in C.) A new type is a PyTypeObject basically filled with function pointers and arrays of function pointers to handle the various operations one might do on the new type. This PyTypeObject is coupled with a C-structure containing the "data" for the new type. This data C structure lists PyObject_HEAD as it's first member and then whatever other data is necessary. Making a new type is thus a matter of creating these two C structures and filling in the TypeObject table with function pointers to handle various operations (getting and setting attributes, treating the type as an abstract number, sequence, or mapping, or printing the object). Python has an abstract object interface on the C level, that is used, so that if a type that has a "number" interface (operations) it can be used like a number, if it has a "sequence" interface can be indexed like a sequence, or if it has a "mapping" interface it can be indexed like a dictionary. Classes ====================== A Python Class is at the C level just another "type." There are actually two "types" associated with a Python class: an instance type and a class type. An instance of a class is the instance type. So every instance of any class has the same "type." What this means on a C level is that there is one more layer of indirection for each "operation" in Python when the type is "class". The Python interpreter goes through the "class type" to see what to do and finds the appropriate C function from that PyTypeObject Method table. This C function does a dictionary lookup using the special method names and executes the Python function associated with that name for the particular instance (which may call back into a compiled extension module to do the actual work). This level of indirection gives a great deal of dynamic flexibility since classes can be subclassed and attributes can be added dynamically, but there will be a performance hit which won't be noticeable except inside Python iteration loops. So in reality there is no "type"-"class" dichotomy. Everything is a type. It's just that classes are dynamic types which allow you to define Python functions to implement the "method table" The reason for the dichotomy is that classes are so useful, that people really like them, and use them quite a bit so that the other static types seem quite rigid in comparison. Extension Classes ================================== This is another fancy, dynamic "type" not distributed in the Python Core but developed by Digital Creations (the Zope people) in order to let C programmers "subclass" types. I'm not an expert on these as I've never really used them but as far as I can tell they bring the idea of "dynamic types" to the C programmer. This is accomplished by making all types just subtypes of the extension class "type". One way to understand the result is by understanding what the type command tells you about your new "extension class". It will tell you that's it's of type "extension class." So, dynamic typing is again implemented with another layer of indirection where the fixed special C functions of the extension class "type" call out to your particular set of registered C functions. The difference is that the indirection is all handled in C. So those are the choices for implementing new behavior in Python. Currently, Numerical Python is implemented as a new "type" which defines all of these interfaces. The mapping interface handles "extended slicing," the "sequence" interface allows the array to return something when len() is called for example, and the "number" interface implements the operators. Actually, two new types are defined: a "ufunc" type and an "array" type. All of the operators are implemented as instances of the "ufunc" type. The "ufunc" essentially encapsulates the "casting and broadcasting" rules associated with elementwise operations. The ufunc is not well-understood by most non-developers I've talked too since most people don't instantiate their own ufuncs (which must be instantiated in C). The code works and is fast, but it can be hard to extend and there are pieces that are poorly documented and hard to understand. For example, nobody has reworked the "extended slicing" syntax to enable arbitrary-index slicing, despite many people who would like that feature (actually I've heard that John Bernard did finally write some code to do that but I've never seen it and it's not there now). As mentioned before, David Ascher made the necessary changes to make Numerical Python of type "extension class" which among other things, allowed, the type to be "subclassed" from within Python. I thought this was a nice solution and we'd have to hear from him as to what went wrong. The only trouble I had with it is that the C-API changed slightly in that Arrays were no longer of type Array_Type and code that depended on it would break (the same is true of any redesign making Python arrays a class). We'd have to hear from him as to what other problems he saw. It still doesn't solve the problem of maintainability of the C-code base, but it definitely gave a more flexible result to the Python user. Perhaps retrofitting the ExtensionClass solution with an enhanced C-API would be a better solution. We really need David's input on that suggestion... The idea I've put forward is to make the object "classes" but I would support the "extension class" solution as well. Regardless of how it is implemented, we still need to design the appropriate "objects" (arraytype, NDArray, Ufunc) and how they interact with each other, as well as a suitable C-API so that they work together seemlessly. I hope this helps some readers who are less familiar with extending Python. DISCLAIMER: I am not the world's expert on these issues but I do have some experience, so take what lessons you may. Best wishes, Travis Oliphant
participants (2)
-
Pete Shinners
-
Travis Oliphant