
Greetings to all interested in Numerical Python, My purpose in writing this somewhat long post is to inform interested parties as to where NumPy is going and how far it has gone. I'm doing this in order to coordinate interest and try to summarize some of the recent conversations I've had with other interested people. There are a significant handful of people who are very interested in where Numerical Python is going. All of these people are very bright and have distinct desires for the future of Numerical Python which come from quite diverse experience. This intelligence and diversity brings tremendous strength (both current and potential) to the community and has made Numerical Python an extremely useful tool. Of course, these benefits do not come cheaply: there is quite a bit of disagreement about how things should be done --- mostly due to the fact that people use Numerical Python for different things. Fortunately, this disagreement is not insurmountable provided people are willing to compromise a little syntatic sugar here and there. Numerical Python users have been enjoying the flexibility and power of the underlying programming language for several years. The price we must pay for using a language that is not wholly dedicated to Numerical pursuits, is that we must cooperate with other users of the core language who have interests entirely different than our own. Since Numerical programming is rarely "strictly numerical," what we gain is access to the work they do in improving Python's stock of library tools. When I was introduced to Numerical Python system, some of the results of this compromise were a little annoying to me --- somewhat like the whitespace rule. What I found, however, was that my annoyance gave way to elation as I realized that the non-numeric objects and toolkits where extremely beneficial to me in my numeric work: regular expressions, serving graphs from a website, writing translators for various files and formats, etc. With that introduction, I'll give a brief history of Numerical Python (please forgive me if I have neglected important contributors). Numerical Python started from the work of Jim Hugunin (which he used as part of his Oral Examination at MIT). He posted an announcement of his proposal in August of 1995 based on the Matrix Object previsouly presented by Jim Fulton. Early discussions of the work can be found at http://www.python.org/pipermail/matrix-sig/ which presents very interested reading since many of the topics peole still talk about were hashed even back then. Konrad Hinsen, Paul Dubois, David Ascher, and Jim Fulton were all early contributors. Jim Fulton's work and connections to Guido Van Rossum enabled many of the early changes (extended slicing, complex numbers, ellipses) to get into Python itself. Guido was also part of the early discussion. Konrad Hinsen contributed a significant amount of code to the current version of Numeric Python as well. Jim Hugunin released version 0.2 in December of 1995 and followed the release early, release often model for several months to get Numerical Python into a working state. It is obvious that he spent many hours writing code (time which NumPy2 contributors have not been able to duplicate). One thing that led to some stall in Numerical Python's development is that Jim Hugunin left the project to concentrate on JPython. Paul Dubois picked up the task of project administrator and has done an admirable job, including securing resources to get the current documentation written. David Ascher wrote the bulk of that important resource. Personally, I started using Numerical Python after scouring the Net for something to replace MATLAB for me which had become burdensome under the weight of large data volumes and inefficient memory handling. I started using Numerical Python in the Spring of 1998 ( a relative late-comer ) but I have used it actively ever since. I started releasing packages at that time to increase the number of toolboxes available to the Numerical Python programmer as I was quite happy with the language itself (after I got over the initial annoyances). I've released many pieces of code since then which I personally use quite regularly. Most of these can be found at http://oliphant.netpedia.net Naturally my contributions have been in areas where I had a personal need, but they have enabled me to understand the Numerical Python source code enough to feel confident in modifying it. With that bit of history let's get into why NumPy2: Guido Van Rossum has expressed willingness to include multidimensional arrays into the Python core. The source of this willingness appears to be a general respect for the community of users who use Python for Numerical programming (although he himself is not one of those users). There is already a useful one-dimensional array object distributed with Python which, however, does not support any operations. Some of it's features where borrowed for the current Numerical Python. Last year, I suggested that the PIL and Numeric Python work more closely together (since an image is conceptually just a 2-D (or 3-D for color) Numeric Python array). /F from pythonware responded by saying that until Numeric Python was a part of Python itself he saw no reason to modify the PIL. I took the bait and after pondering why 4 years had elapsed without Numerical Python getting into Python itself, I contacted Guido and Paul to start the ball rolling. Guido's response was that those familiar with the code said it was too ugly and unwieldy to put into Python. The code is just too hard to modify and understand. Evidently, since there are only a handful of people of the hundreds that use it who submit bug patches, or feature enhancements, this must be true. Those who do understand how it works have a hard time finding time to make needed changes --- the intrinsic cooperation problem with volunteer time that is not funded (or contributed to) by those who make use of the results. Guido was kind enough to provide me with some design documents for an implementation of multidimensional arrays that he had worked out. Thinking I would be in graduate school for longer than I am going to be, I set about trying to clean up Numerical Python with the intent of getting it into the Python core. As part of this effort I conducted a survey of current Numerical Python users to find out their interests. The survey and it's results are available at the sourceforge site for Numerical Python. Basically the results indicate that most people agree on some important features (like arbitrary indexing into arrays), but disagree on some details (copy vs. reference and automatic casting rules being the most memorable). While the results were useful, a simple comment made by one of the survey participants made a significant impression on me: "the C-code is too inflexible and hard to change." This is essentially the problem that Paul Dubois had identified and which was keeping Numerical Python out of the Python core. At the same time I had been doing some work with implementing a sparse matrix package for Python by wrapping some compiled C and Fortran code into a Python class I'd constructed. The results were very encouraging and made me realize that the same technique could be used to make Numerical Python much more flexible and easier to extend while retaining it's significant speed benefits. I decided to make a new implementation of Numerical Python where the underlying objects (the array and ufunc objects) are not extension types but true Python classes. This would allow significant benefits in terms of flexibility and modifiability with a small memory-overhead loss and an indeterminate speed change (it will likely be faster under some usages and slightly slower in others). I also wanted to add more types (unsigned types, boolean, and potentially others). While making this change, I realized that another way out of the "type-class" dichotomy (along with ExtensionClasses) is to not make new types at all. If all types were really ExtensionClasses and all new types had to be as well, this could effectively solve the problem from the Python user perspective as well. An noble effort at making Numerical Python an Extension Class was undertaken by David Ascher last year. His work became the ill-fated Numerical Python 12. I rather liked his work, but there were some very hard to trace bugs in the implementation, and the C-code was still hard to modify. Another problem (that must be dealt with with the new implementation as well) is the significant amount of code that has been written to the old C-API. This finally brings us to the state of Numerical Python. I've been working on this implementation on and off for six months (mostly off), but have worked out many of the design details. Since my time is currently limited for the next 3 months, I wanted to let others know of the status to encourage involvement. We have a window here to get this next version of Numeric into Python 2.1, but the window will probably close sometime in January, so there is some urgency. In the next installment, I will outline the design of Numerical Python 2 and some of it's goals. -Travis Oliphant