Hi,

Thanks Nathaniel and others for sparking this discussion as I think it is very timely.

2015-08-25 12:03 GMT+02:00 Nathaniel Smith <njs@pobox.com>:

Let's focus on evolving numpy as far as we can without major
break-the-world changes (no "numpy 2.0", at least in the foreseeable
future).

And, as a target for that evolution, let's change our focus from
numpy as "NumPy is the library that gives you the np.ndarray object
(plus some attached infrastructure)", to "NumPy provides the
standard framework for working with arrays and array-like objects in
Python"

Sorry to disagree here, but in my opinion NumPy *already* provides the standard framework for working with arrays and array-like objects in Python as its huge popularity shows. If what you mean is that there are too many efforts trying to provide other, specialized data containers (things like DataFrame in pandas, DataArray/Dataset in xarray or carray/ctable in bcolz just to mention a few), then let me say that I am of the opinion that there can't be a silver bullet for tackling all the problems that the PyData community is facing.

The libraries using specialized data containers (pandas, xray, bcolz...) may have more or less machinery on top of them so that conversion to NumPy not necessarily happens internally (many times we don't want conversions for efficiency), but it is the capability of producing NumPy arrays out of them (or parts of them) what makes these specialized containers to be incredible more useful to users because they can use NumPy to fill the missing gaps, or just use NumPy as an intermediate container that acts as input for other libraries.

On the subject on why I don't think a universal data container is feasible for PyData, you just have to have a look at how many data structures Python is providing in the language itself (tuples, lists, dicts, sets...), and how many are added in the standard library (like those in the collections sub-package). Every data container is designed to do a couple of things (maybe three) well, but for other use cases it is the responsibility of the user to choose the more appropriate depending on her needs. In the same vein, I also think that it makes little sense to try to come with a standard solution that is going to satisfy everyone's need. IMHO, and despite all efforts, neither NumPy, NumPy 2.0, DyND, bcolz or any other is going to offer the universal data container.

Instead of that, let me summarize what users/developers like me need from NumPy for continue creating more specialized data containers:

1) Keep NumPy simple. NumPy is the truly cornerstone of PyData right now, and it will be for the foreseeable future, so please keep it usable and *minimal*. Before adding any more feature the increase in complexity should carefully weighted.

2) Make NumPy more flexible. Any rewrite that allows arrays or dtypes to be subclassed and extended more easily will be a huge win. *But* if in order to allow flexibility you have to make NumPy much more complex, then point 1) should prevail.

3) Make of NumPy a sustainable project. Historically NumPy depended on heroic efforts of individuals to make it what it is now: *an industry standard*. But individual efforts, while laudable, are not enough, so please, please, please continue the effort of constituting a governance team that ensures the future of NumPy (and with it, the whole PyData community).

Finally, the question on whether NumPy 2.0 or projects like DyND should be chosen instead for implementing new features is still legitimate, and while I have my own opinions (favourable to DyND), I still see (such is the price of technological debt) a distant future where we will find NumPy as we know it, allowing more innovation to happen in Python Data space.

Again, thanks to all those braves that are allowing others to build on top of NumPy's shoulders.

Francesc Alted