Seeking feedback: design doc for `namedarray`, a lightweight array data structure with named dimensions
![](https://secure.gravatar.com/avatar/6db5fdb382952fb51007c7613d02c18b.jpg?s=120&d=mm&r=g)
:wave:t5: folks, [there has been growing interest in a lightweight array structure](https://github.com/pydata/xarray/issues/3981) that's in the same vein as [xarray's Variable](https://docs.xarray.dev/en/stable/generated/xarray.Variable.html). we've put together a design doc for `namedarray`, and we could use your feedback/input. ## what is `namedarray`? in essence, `namedarray` aims to be a lighter version of xarray's Variable—shedding some of the heavier dependencies (e.g. Pandas) but still retaining the goodness of named dimensions. ## what makes it special? * **Array Protocol Compatibility**: we are planning to make it compatible with existing array protocols and the new [Python array API standard](https://data-apis.org/array-api/latest/). * **Duck-Array Objects**: designed to wrap around multiple duck-array objects, like NumPy, Dask, Sparse, Pint, CuPy, and PyTorch. ## why are we doing this? the goal is to bridge the gap between power and simplicity, providing a lightweight alternative for scientific computing tasks that don't require the full firepower of Xarray (`DataArray` and `Dataset`). ## share your thoughts We've put together a design doc that goes into the nitty-gritty of `namedarray`. your insights could be invaluable in making this initiative a success. please give it a read and share your thoughts [here](https://github.com/pydata/xarray/discussions/8080) * **Design Doc**: [namedarray Design Document](https://github.com/pydata/xarray/blob/main/design_notes/named_array_design_d...) cross posting from [Scientifc Python Discourse](https://discuss.scientific-python.org/t/seeking-feedback-design-doc-for-name...)
![](https://secure.gravatar.com/avatar/cd9a02f9b5ba9f3a830b7346491e9547.jpg?s=120&d=mm&r=g)
Some historical discussions on a namedarray on the scikit-learn side: https://github.com/scikit-learn/enhancement_proposals/pull/25 Might be useful to y'all. On Fri, Oct 20, 2023 at 8:49 AM Dom Grigonis <dom.grigonis@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/697900d3a29858ea20cc109a2aee0af6.jpg?s=120&d=mm&r=g)
LArray might also be useful to look at. I think there was a time when it didn't use pandas, but it does have it as a dependency now. https://github.com/larray-project/larray I think this would be a really useful endeavor. The CDF data model is extremely useful, and adopting even a piece of it would bring great benefits. I find it particularly useful in always having access to my coordinates, especially when debugging a problem with my data, One thing that can make things messy is with designing code to accept these as inputs and outputs. Do you explicitly pass each of the data along with explicitly passing in coordinate variables, or do you just let the coordinates "come along for the ride"? If they come along implicitly, should functions require additional parameters for the names of the coordinates, or should the function require that the dimensions have particular names? I've been using XArray since back when it was called "xray", and I still don't have concrete answers to these questions. Hopefully, wider adoption will help bring inspirations for better design principles. Cheers! Ben Root On Fri, Dec 1, 2023 at 12:51 PM Dom Grigonis <dom.grigonis@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/cd9a02f9b5ba9f3a830b7346491e9547.jpg?s=120&d=mm&r=g)
Some historical discussions on a namedarray on the scikit-learn side: https://github.com/scikit-learn/enhancement_proposals/pull/25 Might be useful to y'all. On Fri, Oct 20, 2023 at 8:49 AM Dom Grigonis <dom.grigonis@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/697900d3a29858ea20cc109a2aee0af6.jpg?s=120&d=mm&r=g)
LArray might also be useful to look at. I think there was a time when it didn't use pandas, but it does have it as a dependency now. https://github.com/larray-project/larray I think this would be a really useful endeavor. The CDF data model is extremely useful, and adopting even a piece of it would bring great benefits. I find it particularly useful in always having access to my coordinates, especially when debugging a problem with my data, One thing that can make things messy is with designing code to accept these as inputs and outputs. Do you explicitly pass each of the data along with explicitly passing in coordinate variables, or do you just let the coordinates "come along for the ride"? If they come along implicitly, should functions require additional parameters for the names of the coordinates, or should the function require that the dimensions have particular names? I've been using XArray since back when it was called "xray", and I still don't have concrete answers to these questions. Hopefully, wider adoption will help bring inspirations for better design principles. Cheers! Ben Root On Fri, Dec 1, 2023 at 12:51 PM Dom Grigonis <dom.grigonis@gmail.com> wrote:
participants (4)
-
Adrin
-
Anderson Banihirwe
-
Benjamin Root
-
Dom Grigonis