[Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

Travis Oliphant travis at continuum.io
Sat May 12 20:30:23 EDT 2012


> 
> I think the long-term generality is a lot bigger than that:
> 
>  - Compressed arrays
>  - Interfaces to HDF files
>  - Distributed-memory arrays
>  - Blocked arrays
>  - Semi-sparse and sparse (diagonal, but also triangular, symmetric, 
> repeating, ...)
>  - Lazy evaluation: "generating_multiply(mydata, zero_mask)"
> 
> While what me and Mark F. cares about is computational efficiency for 
> current arrays, this generality is almost unavoidable.
> 
> In fact -- from ideas Travis have posted to this list earlier + 
> continuum.io, I assume this wider scope is something you and Travis must 
> necessarily have thought a lot about.
> 
> Anyway, I agree with Mark F. that right design is probably a new, 
> low-level, (very small!) C library with no Python dependencies that just 
> provides some APIs to try to standardize this "how to communicate array 
> data" at a more basic level than NumPy (and much smaller and different 
> scope than the various "distill NumPy to a C core" things that's been 
> talked about the past years, something I have zero interest in).
> 
> If NumPy devs are interested in this discussion on a detailed level, 
> please say so; me and Mark F might go to Skype (or even meet in person) 
> to get higher bandwidth than ML, and if more people should be invited 
> then it's good to know.
> 

I, for one, am very interested in this discussion.    This is very much along the lines I have been thinking.     To me it is much more important to solidify the concepts of the "interface" and what is essential about it than to create yet another library.    I think to your general notion of a N-d block transfer API you would also need 1-d, 2-d and maybe 3-d specializations which take an additional "axis" argument to denote which sub-region is being described.     But, this is probably enough.  

I am not sure what the specific relationship is between your thoughts and the email thread Mark referenced, but I do know that there is a deep connection to the *concept* of ufuncs which are currently the core abstraction for iterating over low-level calculations.    You want the ability to create more powerful iteration constructs (like broadcasting and generalized ufuncs and windowed kernel funcs) while only having to define a single calculation of the kernel.    

A more generalized ufunc notion coupled with an improved low-level interface concept and you could have a system for doing anything that is independent of NumPy and NumPy would just be one of many array concepts that could co-exist and share development resources.   

Your thoughts are definitely the future.   We are currently building such a thing.   We would like it to be open source.    We are currently preparing a proposal to DARPA as part of their XDATA proposal in order to help fund this.    Email me offlist if you would like to be a part of this proposal.    You don't have to be a U.S. citizen to participate in this. 

Thanks,

-Travis




> Dag
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list