[Numpy-discussion] Numpy governance update

John Hunter jdh2358 at gmail.com
Thu Feb 16 23:20:00 EST 2012


On Thu, Feb 16, 2012 at 7:26 PM, Alan G Isaac <alan.isaac at gmail.com> wrote:

> On 2/16/2012 7:22 PM, Matthew Brett wrote:
> > This has not been an encouraging episode in striving for consensus.
>
> I disagree.
> Failure to reach consensus does not imply lack of striving.
>
>
Hey Alan, thanks for your thoughtful and nuanced views.  I agree  with
everything you've said, but have a few additional points.

At the risk of wading into a thread that has grown far too long, and
echoing Eric's comments that the idea of governance is murky at best
when there is no provision for enforceability, I have a few comments.
Full disclosure: Travis has asked me and I have agreed to to serve on
a board for "numfocus", the not-for-profit arm of his efforts to
promote numpy and related tools.  Although I have no special numpy
developer chops, as the original author of matplotlib, which is one of
the leading "numpy clients", he asked me to join his organization as a
"community representative".  I support his efforts, and so agreed to
join the numfocus board.

My first and most important point is that the subtext of many postings here
about the fear of undue and inappropriate influence of Continuum under
Travis' leadership is far overblown.  Travis created numpy -- it is
his baby.  Undeniably, he created it by standing on the shoulders of
giants: Jim Hugunin, Paul Dubois, Perry Greenfield and his team, and
many others.  But the idea that we need to guard against the
possibility that his corporate interests will compromise his interests
in "what is best for numpy" is academic at best.

As someone who has created a significant project in the realm of
"scientific computing in Python", I can tell you that it is something
I take quite a bit of pride in and it is very important to me that the
project thrives as it was intended to: as a free, open-source,
best-practice way of doing science.  I know Travis well enough to know
he feels the same way -- numpy doing well is *at least* important to
him his company doing well.  All of his recent actions to start a
company and foundation which focuses resources on numpy and related
tools reinforce that view.  If he had a different temperament, he
wouldn't have devoted five to ten years of is life to Numeric, scipy
and numpy.  He is a BDFL for a reason: he has earned our trust.

And he has proven his ability to lead when *almost everyone* was
against him.  At the height of the Numeric/numarray split, and I was
deeply involved in this as the mpl author because we had a "numerix"
compatibility layer to allow users to use one or the other, Travis
proposed writing numpy to solve both camp's problems.  I really can't
remember a single individual who supported him.  What I remember is
the cacophony of voices who though this was a bad idea, because of the
"third fork" problem.  But Travis forged ahead, on his own, wrote
numpy, and re-united the Numeric and numarray camps.  And
all-the-while he maintained his friendship with the numarray
developers (Perry Greenfield who led the numarray development effort
has also been invited by Travis to the numfocus board, as has Fernando
Perez and Jarrod Millman).  Although MPL at the time agreed to support
a third version in its numerix compatibility layer for numpy, I can
thankfully say we have since dropped support for the compatibility
layer entirely as we all use numpy now.  This to me is the distilled
essence of leadership, against the voices of the masses, and it bears
remembering.

I have two more points I want to make: one is on democracy, and one is
on corporate control.  On corporate control: there have been a number
of posts in this thread about the worries and dangers that Continuum
poses as the corporate sponser of numpy development, about how this
may cause numpy to shift from a model of a few loosely connected,
decentralized cadre of volunteers to a centrally controlled steering
committee of programmers who are controlled by corporate headquarters
and who make all their decisions around the water cooler unobserved by
the community of users.

I want to make a connection to something that happened in the history
of matplotlib development, something that is not strictly analogous
but I think close enough to be informative.  Sometime around 2005,
Perry Greenfield, who heads the development team of the Space
Telescope Science Institute (STScI) that is charged with processing
the Hubble image pipeline, emailed me that he was considering using
matplotlib as their primary image visualization tool.  I can't tell
you how excited I was at the time.  The idea of having institutional
sponsorship from someone as prestigious and resourceful as STScI was
hugely motivating.  I worked feverishly for months to add stuff they
needed: better rendering, better image support, mathtext and lots
more.  But more importantly, Perry was offering to bring institutional
support to my project: well qualified full-time employees who
dedicated a significant part of their time to matplotlib
development. He had done this before with numarray development, and
the contributions of his team are enormous.  Many mpl features owe
their support to institutional sopnsership: Perry's group deserves the
biggest props, but Ted Drain's group at the JPL and corporate sponsors
as well are on the list.

What I want you to think about are the parallels between Perry and his
team joining matplotlib's development effort and Continuum's stated
desire to contribute to numpy development.  Yes, STScI is a
not-for-profit entity operated by NASA, and Continuum is a
for-profit-entity with a not-for-profit arm (numfocus).  But the
differences are not so great in my experience.  Both for-profits and
not-for-profits experience institutional pressures to get code out on
a deadline.  In fact, perhaps our "finest hour" in matplotlib
development came as a result of one of out not-for-profit client's
deadlines.  My favorite story is when the Jet Propulsion Labs at
Caltech emailed me about the inadequacy of our ellipse approximations,
and gave us the constraint that the Mars Rover was scheduled to land
in the next few months.  Talk about a hard deadline!  Michael
Droettboom, under Perry's direction, implemented a
"8-cubic-spline-approximation-to-curves-in-the-viewport" solution that
I honestly think gives matplotlib the *best* approximation to such
curves anywhere.  Period.  Institutional deadlines to get working code
into the codebase, whether from a for-profit or not-for-profit entity,
and usually are a good thing. It may not be perfect going in, but it is
usually better for being there.

That is one example from matplotlib's history that illustrates the
benefit of institutional sponsers in a project.  In this example, the
organizational goal -- getting the Rover to land without crashing -- is
one we can all relate to and support.  And the resolution to the story,
in which a heroically talented developer (Michael D) steps up to
solve the problem, is one we can all aspire to.  But the essential
ingredients of the story are not so different from what we face here:
an organization needs to solve a problem on a deadline; another
organization, possibly related, has the resources to get the job done;
all efforts are contributed to the public domain.

Now that brings me to my final and perhaps most controverisal point.
I don't believe democracy is the right solution for most open source
problems.  As exhibit A, I reference the birth of numpy itself that I
discussed above.  Numpy would have never happened if community input
were considered.  I'm pretty sure that all of us that were there at
the time can attest to this.

Democracy is something that many of us have grown up by default to
consider as the right solution to many, if not most or, problems of
governance.  I believe it is a solution to a specific problem of
governance.  I do not believe democracy is a panacea or an ideal
solution for most problems: rather it is the right solution for which
the consequences of failure are too high.  In a state (by which I mean
a government with a power to subject its people to its will by force
of arms) where the consequences of failure to submit include the
death, dismemberment, or imprisonment of dissenters, democracy is a
safeguard against the excesses of the powerful.  Generally, there is
no reason to believe that the simple majority of people polled is the
"best" or "right" answer, but there is also no reason to believe that
those who hold power will rule beneficiently.  The democratic ability
of the people to check to the rule of the few and powerful is
essential to insure the survival of the minority.

In open source software development, we face none of these problems.
Our power to fork is precisely the power the minority in a tyranical
democracy lacks: noone will kill us for going off the reservation.  We
are free to use the product or not, to modify it or not, to enhance it
or not.

The power to fork is not abstract: it is essential.  matplotlib, and
chaco, both rely *heavily* on agg, the Antigrain C++ rendering
library.  At some point many years ago, Maxim, the author of Agg,
decided to change the license of Agg (circa version 2.5) to GPL rather
than BSD.  Obviously, this was a non-starter for projects like mpl,
scipy and chaco which assumed BSD licensing terms.  Unfortunately,
Maxim had a new employer which appeared to us to be dictating the
terms and our best arguments fell on deaf ears.  No matter: mpl and
Enthought chaco have continued to ship agg 2.4, pre-GPL, and I think
that less than 1% of our users have even noticed.  Yes, we forked the
project, and yes, noone has noticed.  To me this is the ultimate
reason why governance of open source, free projects does not need to
be democratic.  As painful as a fork may be, it is the ultimate
antidote to a leader who may not have your interests in mind.  It is
an antidote that we citizens in a state government may not have.

It is true that numpy exists in a privileged position in a way that
matplotlib or scipy does not.  Numpy is the core.  Yes, Continuum is
different than STScI because Travis is both the lead of Numpy and the
lead of the company sponsoring numpy.  These are important
differences.  In the worst cases, we might imagine that these
differences will negatively impact numpy and associated tools.  But
these worst case scenarios that we imagine will most likely simply
distract us from what is going on: Travis, one of the most prolific
and valuable contributers to the scientific python community, has
decided to refocus his efforts to do more.  And that is a very happy
moment for all of us.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120216/66e1a25b/attachment.html>


More information about the NumPy-Discussion mailing list