[Numpy-discussion] DARPA funding for Blaze and passing the NumPy torch

Ralf Gommers ralf.gommers at gmail.com
Mon Dec 17 02:07:03 EST 2012


On Mon, Dec 17, 2012 at 7:07 AM, Travis Oliphant <travis at continuum.io>wrote:

> Hello all,
>
> There is a lot happening in my life right now and I am spread quite thin
> among the various projects that I take an interest in.     In particular, I
> am thrilled to publicly announce on this list that Continuum Analytics has
> received DARPA funding (to the tune of at least $3 million) for Blaze,
> Numba, and Bokeh which we are writing to take NumPy, SciPy, and
> visualization into the domain of very large data sets.    This is part of
> the XDATA program, and I will be taking an active role in it.    You can
> read more about Blaze here:  http://blaze.pydata.org.   You can read more
> about XDATA here:  http://www.darpa.mil/Our_Work/I2O/Programs/XDATA.aspx
>

Hi Travis, that is fantastic news, congratulations! I can't wait to see
what you guys will come up with in the near future.

Also thank you for the rest of this thoughtful post; it'll take me some
time to digest but I enjoyed the reflection on the past.

Best,
Ralf



>
> I personally think Blaze is the future of array-oriented computing in
> Python.   I will be putting efforts and resources next year behind making
> that case.   How it interacts with future incarnations of NumPy, Pandas, or
> other projects is an interesting and open question.  I have no doubt the
> future will be a rich ecosystem of interoperating array-oriented
> data-structures.     I invite anyone interested in Blaze to participate in
> the discussions and development at
> https://groups.google.com/a/continuum.io/forum/#!forum/blaze-dev or watch
> the project on our public GitHub repo:
> https://github.com/ContinuumIO/blaze.  Blaze is being incubated under the
> ContinuumIO GitHub project for now, but eventually I hope it will receive
> its own GitHub project page later next year.   Development of Blaze is
> early but we are moving rapidly with it (and have deliverable deadlines ---
> thus while we will welcome input and pull requests we won't have a ton of
> time to respond to simple queries until
>   at least May or June).    There is more that we are working on behind
> the scenes with respect to Blaze that will be coming out next year as well
> but isn't quite ready to show yet.
>
> As I look at the coming months and years, my time for direct involvement
> in NumPy development is therefore only going to get smaller.  As a result
> it is not appropriate that I remain as "head steward" of the NumPy project
> (a term I prefer to BFD12 or anything else).   I'm sure that it is apparent
> that while I've tried to help personally where I can this year on the NumPy
> project, my role has been more one of coordination, seeking funding, and
> providing expert advice on certain sections of code.    I fundamentally
> agree with Fernando Perez that the responsibility of care-taking open
> source projects is one of stewardship --- something akin to public service.
>    I have tried to emulate that belief this year --- even while not always
> succeeding.
>
> It is time for me to make official what is already becoming apparent to
> observers of this community, namely, that I am stepping down as someone who
> might be considered "head steward" for the NumPy project and officially
> leaving the development of the project in the hands of others in the
> community.   I don't think the project actually needs a new "head steward"
> --- especially from a development perspective.     Instead I see a lot of
> strong developers offering key opinions for the project as well as a great
> set of new developers offering pull requests.
>
> My strong suggestion is that development discussions of the project
> continue on this list with consensus among the active participants being
> the goal for development.  I don't think 100% consensus is a rigid
> requirement --- but certainly a super-majority should be the goal, and
> serious changes should not be made with out a clear consensus.     I would
> pay special attention to under-represented people (users with intense usage
> of NumPy but small voices on this list).   There are many of them.    If
> you push me for specifics then at this point in NumPy's history, I would
> say that if Chuck, Nathaniel, and Ralf agree on a course of action, it will
> likely be a good thing for the project.   I suspect that even if only 2 of
> the 3 agree at one time it might still be a good thing (but I would expect
> more detail and discussion).    There are others whose opinion should be
> sought as well:  Ondrej Certik, Perry Greenfield, Robert Kern, David
> Cournapeau, Francesc Alted, and Mark Wiebe to
>  name a few.    For some questions, I might even seek input from people
> like Konrad Hinsen and Paul Dubois --- if they have time to give it.   I
> will still be willing to offer my view from time to time and if I am asked.
>
> Greg Wilson (of Software Carpentry fame) asked me recently what letter I
> would have written to myself 5 years ago.   What would I tell myself to do
> given the knowledge I have now?     I've thought about that for a bit, and
> I have some answers.   I don't know if these will help anyone, but I offer
> them as hopefully instructive:
>
>         1) Do not promise to not break the ABI of NumPy --- and in fact
> emphasize that it will be broken at least once in the 1.X series.    NumPy
> was designed to add new data-types --- but not without breaking the ABI.
>  NumPy has needed more data-types and still needs even more.   While it's
> not beautifully simple to add new data-types, it can be done.   But, it is
> impossible to add them without breaking the ABI in some fashion.   The
> desire to add new data-types *and* keep ABI compatibility has led to
> significant pain.   I think the ABI non-breakage goal has been amplified by
> the poor state of package management in Python.   The fact that it's
> painful for someone to update their downstream packages when an upstream
> ABI breaks (on Windows and Mac in particular) has put a lot of unfortunate
> pressure on this community.    Pressure that was not envisioned or
> understood when I was writing NumPy.
>
> (As an aside:  This is one reason Continuum has invested resources in
> building the conda tool and a completely free set of binary packages called
> Anaconda CE which is becoming more and more usable thanks to the efforts of
> Bryan Van de Ven and Ilan Schnell and our testing team at Continuum.   The
> conda tool:  http://docs.continuum.io/conda/index.html is open source and
> BSD licensed and the next release will provide the ability to build
> packages, build indexes on package repositories and interface with pip.
>  Expect a blog-post in the near future about how cool conda is!).
>
>         2) Don't create array-scalars.  Instead, make the data-type object
> a meta-type object whose instances are the items returned from NumPy
> arrays.   There is no need for a separate array-scalar object and in fact
> it's confusing to the type-system.    I understand that now.  I did not
> understand that 5 years ago.
>
>         3) Special-case small arrays to avoid the memory indirection and
> look at PDL so that generalized ufuncs are supported from the beginning.
>
>         4) Define missing-value data-types and labels on the dimensions
> and arrays
>
>         5) Define a standard "dictionary of NumPy arrays" interface as the
> basic "structure of arrays" concept to go with the "array of structures"
> that structured arrays provide.
>
>         6) Start work on SQL interface to NumPy arrays *now*
>
> Additional comments I would make to someone today:
>
>         1) Most of NumPy should be written in Python with Numba used as
> the compiler (particularly as soon as Numba gets the ability to create
> Python extension modules which is in the next release).
>         2) There are still many, many optimizations that can be made in
> NumPy run-time (especially in the face of modern hardware).
>
> I will continue to be available to answer questions and I may chime in
> here and there on pull requests.    However, most of my time for NumPy will
> be on administrative aspects of the project where I will continue to take
> an active interest.    To help make sure that this happens in a transparent
> way,  I would like to propose that "administrative" support of the project
> be left to the NumFOCUS board of which I am currently 1 of 9 members.   The
> other board members are currently:  Ralf Gommers, Anthony Scopatz, Andy
> Terrel, Prabhu Ramachandran, Fernando Perez, Emmanuelle Gouillart, Jarrod
> Millman, and Perry Greenfield.      While NumFOCUS basically seeks to
> promote and fund the entire scientific Python stack,   I think it can also
> play a role in helping to administer some of the core projects which the
> board members themselves have a personal interest in.
>
> By administrative support, I mean decisions like "what should be done with
> any NumPy IP or web-domains" or "what kind of commercially-related ads or
> otherwise should go on the NumPy home page", or "what should be done with
> the NumPy github account", etc.  --- basically anything that requires an
> executive decision that is not directly development related.    I don't
> expect there to be many of these decisions.  But, when they show up, I
> would like them to be made in as transparent and public of a way as
> possible.  In practice, the way I see this working is that there are
> members of the NumPy community who are (like me) particularly interested in
> admin-related questions and serve on a NumPy team in the NumFOCUS
> organization.     I just know I'll be attending NumFOCUS board meetings,
> and I would like to help move administrative decisions forward with NumPy
> as part of the time I spend thinking about NumFOCUS.
>
> If people on this list would like to play an active role in those admin
> discussions, then I would heartily welcome them into NumFOCUS membership
> where they would work with interested members of the NumFOCUS board (like
> me and Ralf) to help direct that organization.    I would really love to
> have someone from this list volunteer to serve on the NumPy team as part of
> the NumFOCUS project.   I am certainly going to be interested in the
> opinions of people who are active participants on this list and on GitHub
> pages for NumPy on anything admin related to NumPy, and I expect Ralf would
> also be very interested in those views.
>
> One admin discussion that I will bring up in another email (as this one is
> already too long) is about making 2 or 3 lists for NumPy such as
> numpy-admin at numpy.org,  numpy-dev at numpy.org, and numpy-users at numpy-org.
>
> Just because I'll be spending more time on Blaze, Numba, Bokeh, and the
> PyData ecosystem does not mean that I won't be around for NumPy.    I will
> continue to promote NumPy.   My involvement with Continuum connects me to
> NumPy as Continuum continues to offer commercial support contracts for
> NumPy (and SciPy and other open source projects).   Continuum will also
> continue to maintain its Github NumPy project which will contain pull
> requests from our company that we are working to get into the mainline
> branch.      Continuum will also continue to provide resources for
> release-management of NumPy (we have been funding Ondrej in this role for
> the past 6 months --- though I would like to see this happen through
> NumFOCUS in the future even if Continuum provides much of the money).    We
> also offer optimized versions of NumPy in our commercial Anaconda
> distribution (Anaconda CE is free and open source).
>
> Also, I will still be available for questions and help (I'm not
> disappearing --- just making it clear that I'm stepping back into an
> occasional NumPy developer role).   It has been extremely gratifying to see
> the number of pull-requests, GitHub-conversations, and code contributions
> increase this year.   Even though the 1.7 release has taken a long time to
> stabilize, there have been a lot of people participating in the discussion
> and in helping to track down the problems, figure out what to do, and fix
> them.    It even makes it possible for people to think about 1.7 as a
> long-term release.
>
> I will continue to hope that the spirit of openness, tolerance, respect,
> and gratitude continue to permeate this mailing list, and that we continue
> to seek to resolve any differences with trust and mutual respect.    I know
> I have offended people in the past with quick remarks and actions made
> sometimes in haste without fully realizing how they might be taken.   But,
> I also know that like many of you I have always done the very best I could
> for moving Python for scientific computing forward in the best way I know
> how.
>
> Thank you for the great memories.   If you will forgive a little
> sentiment:  My daughter who is in college now was 3 years old when I began
> working with this community and went down a road that would lead to my
> involvement with SciPy and NumPy.   I have marked the building of my family
> and the passage of time with where the Python for Scientific Computing
> Community was at.   Like many of you, I have given a great deal of
> attention and time to building this community.   That sacrifice and time
> has led me to love what we have created.    I know that I leave this
> segment of the community with the tools in better hands than mine.   I am
> hopeful that NumPy will continue to be a useful array library for the
> Python community for many years to come even as we all continue to build
> new tools for the future.
>
> Very best regards,
>
> -Travis
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20121217/7b31e0e9/attachment.html>


More information about the NumPy-Discussion mailing list