[Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

Francesc Alted faltet at gmail.com
Wed Aug 26 07:14:19 EDT 2015


Hi,

Thanks Nathaniel and others for sparking this discussion as I think it is
very timely.

2015-08-25 12:03 GMT+02:00 Nathaniel Smith <njs at pobox.com>:

>   Let's focus on evolving numpy as far as we can without major
>   break-the-world changes (no "numpy 2.0", at least in the foreseeable
>   future).
>
>   And, as a target for that evolution, let's change our focus from
>   numpy as "NumPy is the library that gives you the np.ndarray object
>   (plus some attached infrastructure)", to "NumPy provides the
>   standard framework for working with arrays and array-like objects in
>   Python"
>

Sorry to disagree here, but in my opinion NumPy *already* provides the
standard framework for working with arrays and array-like objects in Python
as its huge popularity shows.  If what you mean is that there are too many
efforts trying to provide other, specialized data containers (things like
DataFrame in pandas, DataArray/Dataset in xarray or carray/ctable in bcolz
just to mention a few), then let me say that I am of the opinion that there
can't be a silver bullet for tackling all the problems that the PyData
community is facing.

The libraries using specialized data containers (pandas, xray, bcolz...)
may have more or less machinery on top of them so that conversion to NumPy
not necessarily happens internally (many times we don't want conversions
for efficiency), but it is the capability of producing NumPy arrays out of
them (or parts of them) what makes these specialized containers to be
incredible more useful to users because they can use NumPy to fill the
missing gaps, or just use NumPy as an intermediate container that acts as
input for other libraries.

On the subject on why I don't think a universal data container is feasible
for PyData, you just have to have a look at how many data structures Python
is providing in the language itself (tuples, lists, dicts, sets...), and
how many are added in the standard library (like those in the collections
sub-package).  Every data container is designed to do a couple of things
(maybe three) well, but for other use cases it is the responsibility of the
user to choose the more appropriate depending on her needs.  In the same
vein, I also think that it makes little sense to try to come with a
standard solution that is going to satisfy everyone's need.  IMHO, and
despite all efforts, neither NumPy,  NumPy 2.0, DyND, bcolz or any other is
going to offer the universal data container.

Instead of that, let me summarize what users/developers like me need from
NumPy for continue creating more specialized data containers:

1) Keep NumPy simple. NumPy is the truly cornerstone of PyData right now,
and it will be for the foreseeable future, so please keep it usable and
*minimal*.  Before adding any more feature the increase in complexity
should carefully weighted.

2) Make NumPy more flexible. Any rewrite that allows arrays or dtypes to be
subclassed and extended more easily will be a huge win.  *But* if in order
to allow flexibility you have to make NumPy much more complex, then point
1) should prevail.

3) Make of NumPy a sustainable project. Historically NumPy depended on
heroic efforts of individuals to make it what it is now: *an industry
standard*.  But individual efforts, while laudable, are not enough, so
please, please, please continue the effort of constituting a governance
team that ensures the future of NumPy (and with it, the whole PyData
community).

Finally, the question on whether NumPy 2.0 or projects like DyND should be
chosen instead for implementing new features is still legitimate, and while
I have my own opinions (favourable to DyND), I still see (such is the price
of technological debt) a distant future where we will find NumPy as we know
it, allowing more innovation to happen in Python Data space.

Again, thanks to all those braves that are allowing others to build on top
of NumPy's shoulders.

--
Francesc Alted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150826/3115ea4e/attachment.html>


More information about the NumPy-Discussion mailing list