[Numpy-discussion] the direction and pace of development

Konrad Hinsen hinsen at cnrs-orleans.fr
Wed Jan 21 13:27:00 EST 2004


On 21.01.2004, at 19:44, Joe Harrington wrote:

> This is a necessarily long post about the path to an open-source
> replacement for IDL and Matlab.  While I have tried to be fair to

You raise many good points here. Some comments:

> those who have contributed much more than I have, I have also tried to
> be direct about what I see as some fairly fundamental problems in the
> way we're going about this.  I've given it some section titles so you

I'd say the fundamental problem is that "we" don't exist as a coherent 
group. There are a few developer groups (e.g. at STSC and Enthought) 
who write code primarily for their own need and then make it available. 
The rest of us are what one could call "power users": very interested 
in the code, knowledgeable about its use, but not contributing to its 
development other than through testing and feedback.

> THE PROBLEM
>
> We are not following the open-source development model.  Rather, we

True. But is it perhaps because that model is not so well adapted to 
our situation? If you look at Linux (the OpenSource reference), it 
started out very differently. It was a fun project, done by hobby 
programmers who shared an idea of fun (kernel hacking). Linux was not 
goal-oriented in the beginnings. No deadlines, no usability criteria, 
but lots of technical challenges.

Our situation is very different. We are scientists and engineers who 
want code to get our projects done. We have clear goals, and very 
limited means, plus we are mostly somone's employees and thus not free 
to do as we would like. On the other hand, our project doesn't provide 
the challenges that attract the kind of people who  made Linux big. You 
don't get into the news by working on NumPy, you don't work against 
Microsoft, etc. Computational science and engineering just isn't the 
same as kernel hacking.

I develop two scientific Python libraries myself, more specialized and 
thus with a smaller market share, but the situation is otherwise 
similar. And I work much like the Numarray people do: I write the code 
that I need, and I invest minimal effort in distribution and marketing. 
To get the same code developped in the Linux fashion, there would have 
to be many more developers. But they just don't exist. I know of three 
people worldwide whose competence in both Python/C and in the 
application domain is good enough that they could work on the code 
base. This is not enough to build a networked development community. 
The potential NumPy community is certainly much bigger, but I am not 
sure it is big enough. Working on NumPy/Numarray requires the 
combination of not-so-frequent competences, plus availability. I am not 
saying it can't be done, but it sure isn't obvious that it can be.

> Release it in a way that as many people as possible will get it,
> install it, use it for real work, and contribute to it.  Make the main
> focus of the core development team the evaluation and inclusion of
> contributions from others.  Develop a common vision for the program,

This requires yet different competences, and thus different people. It 
takes people who are good at reading others' code and communicating 
with them about it.
Some people are good programmers, some are good scientists, some are 
good communicators. How many are all of that - *and* available?

> I know that Perry's group at STScI and the fine folks at Enthought
> will say they have to work on what they are being paid to work on.
> Both groups should consider the long term cost, in dollars, of
> spending those development dollars 100% on coding, rather than 50% on
> coding and 50% on outreach and intake.  Linus himself has written only

You are probably right. But does your employer think long-term? Mine 
doesn't.

> applications, yet in much less than 7 years Linux became a viable
> operating system, something much bigger than what we are attempting

Exactly. We could be too small to follow the Linux way.

> 1. We should identify the remaining open interface questions.  Not,
>    "why is numeric faster than numarray", but "what should the syntax
>    of creating an array be, and of doing different basic operations".

Yes, a very good point. Focus on the goal, not on the legacy code. 
However, a technical detail that should not be forgotten here: NumPy 
and Numarray have a C API as well, which is critical for many add-ons 
and applications. A C API is more closely tied to the implementation 
than a Python API. It might thus be difficult to settle on an API and 
then work on efficient implementations.

> 2. We should identify what we need out of the core plotting
>    capability.  Again, not "chaco vs. pyxis", but the list of
>    requirements (as an astronomer, I very much like Perry's list).

100% agreement. For plotting, defining the interface should be easier 
(no C stuff).

Konrad.





More information about the NumPy-Discussion mailing list