[Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer
travis at continuum.io
Sat May 12 20:30:23 EDT 2012
> I think the long-term generality is a lot bigger than that:
> - Compressed arrays
> - Interfaces to HDF files
> - Distributed-memory arrays
> - Blocked arrays
> - Semi-sparse and sparse (diagonal, but also triangular, symmetric,
> repeating, ...)
> - Lazy evaluation: "generating_multiply(mydata, zero_mask)"
> While what me and Mark F. cares about is computational efficiency for
> current arrays, this generality is almost unavoidable.
> In fact -- from ideas Travis have posted to this list earlier +
> continuum.io, I assume this wider scope is something you and Travis must
> necessarily have thought a lot about.
> Anyway, I agree with Mark F. that right design is probably a new,
> low-level, (very small!) C library with no Python dependencies that just
> provides some APIs to try to standardize this "how to communicate array
> data" at a more basic level than NumPy (and much smaller and different
> scope than the various "distill NumPy to a C core" things that's been
> talked about the past years, something I have zero interest in).
> If NumPy devs are interested in this discussion on a detailed level,
> please say so; me and Mark F might go to Skype (or even meet in person)
> to get higher bandwidth than ML, and if more people should be invited
> then it's good to know.
I, for one, am very interested in this discussion. This is very much along the lines I have been thinking. To me it is much more important to solidify the concepts of the "interface" and what is essential about it than to create yet another library. I think to your general notion of a N-d block transfer API you would also need 1-d, 2-d and maybe 3-d specializations which take an additional "axis" argument to denote which sub-region is being described. But, this is probably enough.
I am not sure what the specific relationship is between your thoughts and the email thread Mark referenced, but I do know that there is a deep connection to the *concept* of ufuncs which are currently the core abstraction for iterating over low-level calculations. You want the ability to create more powerful iteration constructs (like broadcasting and generalized ufuncs and windowed kernel funcs) while only having to define a single calculation of the kernel.
A more generalized ufunc notion coupled with an improved low-level interface concept and you could have a system for doing anything that is independent of NumPy and NumPy would just be one of many array concepts that could co-exist and share development resources.
Your thoughts are definitely the future. We are currently building such a thing. We would like it to be open source. We are currently preparing a proposal to DARPA as part of their XDATA proposal in order to help fund this. Email me offlist if you would like to be a part of this proposal. You don't have to be a U.S. citizen to participate in this.
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
More information about the NumPy-Discussion