[Numpy-discussion] History and Why NumPy2 (long)

Aug. 22, 2000

      Greetings to all interested in Numerical Python,

My purpose in writing this somewhat long post is to inform interested
parties as to where NumPy is going and how far it has gone.  I'm doing
this in order to coordinate interest and try to summarize some of the
recent conversations I've had with other interested people.

There are a significant handful of people who are very interested in where
Numerical Python is going.  All of these people are very bright and have
distinct desires for the future of Numerical Python which come from quite
diverse experience.  This intelligence and diversity brings tremendous
strength (both current and potential) to the community and has made
Numerical Python an extremely useful tool.  Of course, these benefits do
not come cheaply:  there is quite a bit of disagreement about how things
should be done --- mostly due to the fact that people use Numerical Python
for different things.  Fortunately, this disagreement is not
insurmountable provided people are willing to compromise a little syntatic
sugar here and there.  

Numerical Python users have been enjoying the flexibility and power of the
underlying programming language for several years.  The price we must pay
for using a language that is not wholly dedicated to Numerical pursuits,
is that we must cooperate with other users of the core language who have
interests entirely different than our own.   Since Numerical
programming is rarely "strictly numerical," what we gain is access to
the work they do in improving Python's stock of library tools.  

When I was introduced to Numerical Python system, some of the results of
this compromise were a little annoying to me --- somewhat like the
whitespace rule.  What I found, however, was that my annoyance gave way to
elation as I realized that the non-numeric objects and toolkits where
extremely beneficial to me in my numeric work:  regular expressions,
serving graphs from a website, writing translators for various files and
formats, etc.  

With that introduction, I'll give a brief history of Numerical Python
(please forgive me if I have neglected important contributors).

Numerical Python started from the work of Jim Hugunin (which he used as
part of his Oral Examination at MIT).  He posted an announcement of his
proposal in August of 1995 based on the Matrix Object previsouly presented
by Jim Fulton. Early discussions of the work can be found at 
http://www.python.org/pipermail/matrix-sig/ which presents very interested
reading since many of the topics peole still talk about were hashed even
back then.  Konrad Hinsen, Paul Dubois, David Ascher, and Jim Fulton were
all early contributors.  Jim Fulton's work and connections to Guido Van
Rossum enabled many of the early changes (extended slicing, complex
numbers, ellipses) to get into Python itself.  Guido was also part of the
early discussion.    Konrad Hinsen contributed a significant amount of
code to the current version of Numeric Python as well.  

Jim Hugunin released version 0.2 in December of 1995 and followed the
release early, release often model for several months to get Numerical
Python into a working state.  It is obvious that he spent many hours
writing code (time which NumPy2 contributors have not been able to
duplicate). 

One thing that led to some stall in Numerical Python's development is that
Jim Hugunin left the project to concentrate on JPython.  Paul Dubois
picked up the task of project administrator and has done an admirable job,
including securing resources to get the current documentation written.
David Ascher wrote the bulk of that important resource.

Personally, I started using Numerical Python after scouring the Net for
something to replace MATLAB for me which had become burdensome under the
weight of large data volumes and inefficient memory handling.  I started
using Numerical Python in the Spring of 1998 ( a relative late-comer ) but
I have used it actively ever since.  I started releasing packages at that
time to increase the number of toolboxes available to the Numerical Python
programmer as I was quite happy with the language itself (after I got over
the initial annoyances).  I've released many pieces of code since then
which I personally use quite regularly.  Most of these can be found at
http://oliphant.netpedia.net  Naturally my contributions have been in
areas where I had a personal need, but they have enabled me to understand
the Numerical Python source code enough to feel confident in modifying it.  

With that bit of history let's get into why NumPy2:

Guido Van Rossum has expressed willingness to include multidimensional
arrays into the Python core.  The source of this willingness appears
to be a general respect for the community of users who use Python for 
Numerical programming (although he himself is not one of those
users).  There is already a useful one-dimensional array object
distributed with Python which, however, does not support any operations.
Some of it's features where borrowed for the current Numerical Python.

Last year, I suggested that the PIL and Numeric Python work more closely
together (since an image is conceptually just a 2-D (or 3-D for color)
Numeric Python array).  /F from pythonware responded by saying that
until Numeric Python was a part of Python itself he saw no reason to
modify the PIL.  I took the bait and after pondering why 4 years
had elapsed without Numerical Python getting into Python itself, I
contacted Guido and Paul to start the ball rolling. 

Guido's response was that those familiar with the code said it was too
ugly and unwieldy to put into Python.  The code is just too hard to modify
and understand.  Evidently, since there are only a handful of people of
the hundreds that use it who submit bug patches, or feature enhancements,
this must be true.  Those who do understand how it works have a hard time
finding time to make needed changes --- the intrinsic cooperation problem
with volunteer time that is not funded (or contributed to) by those who
make use of the results.

Guido was kind enough to provide me with some design documents for an
implementation of multidimensional arrays that he had worked out.
Thinking I would be in graduate school for longer than I am going to be, I
set about trying to clean up Numerical Python with the intent of getting
it into the Python core.  As part of this effort I conducted a survey of
current Numerical Python users to find out their interests.  The survey
and it's results are available at the sourceforge site for Numerical
Python.  Basically the results indicate that most people agree on some
important features (like arbitrary indexing into arrays), but disagree on
some details (copy vs. reference and automatic casting rules being the
most memorable). 

While the results were useful, a simple comment made by one of the
survey participants made a significant impression on me:  "the C-code is
too inflexible and hard to change."  This is essentially the problem that
Paul Dubois had identified and which was keeping Numerical Python out of
the Python core.  At the same time I had been doing some work with
implementing a sparse matrix package for Python by wrapping some compiled
C and Fortran code into a Python class I'd constructed.  The results were
very encouraging and made me realize that the same technique could be used
to make Numerical Python much more flexible and easier to extend while
retaining it's significant speed benefits.  

I decided to make a new implementation of Numerical Python where the
underlying objects (the array and ufunc objects) are not extension types
but true Python classes.  This would allow significant benefits in terms
of flexibility and modifiability with a small memory-overhead loss and an
indeterminate speed change (it will likely be faster under some usages and
slightly slower in others).  I also wanted to add more types (unsigned
types, boolean, and potentially others).

While making this change, I realized that another way out of the
"type-class" dichotomy (along with ExtensionClasses) is to not make new
types at all.  If all types were really ExtensionClasses and all new types
had to be as well, this could effectively solve the problem from the
Python user perspective as well.  An noble effort at making Numerical
Python an Extension Class was undertaken by David Ascher last year.  His
work became the ill-fated Numerical Python 12.  I rather liked his work,
but there were some very hard to trace bugs in the implementation, and the
C-code was still hard to modify.  Another problem (that must be dealt with
with the new implementation as well) is the significant amount of code
that has been written to the old C-API.

This finally brings us to the state of Numerical Python.  I've been
working on this implementation on and off for six months (mostly off), but
have worked out many of the design details.  Since my time is
currently limited for the next 3 months, I wanted to let others know of
the status to encourage involvement.  We have a window here to get this
next version of Numeric into Python 2.1, but the window will
probably close sometime in January, so there is some urgency.

In the next installment, I will outline the design of Numerical Python 2
and some of it's goals.  

-Travis Oliphant