Seeking feedback: design doc for `namedarray`, a lightweight array data structure with named dimensions
:wave:t5: folks, [there has been growing interest in a lightweight array structure](https://github.com/pydata/xarray/issues/3981) that's in the same vein as [xarray's Variable](https://docs.xarray.dev/en/stable/generated/xarray.Variable.html). we've put together a design doc for `namedarray`, and we could use your feedback/input. ## what is `namedarray`? in essence, `namedarray` aims to be a lighter version of xarray's Variable—shedding some of the heavier dependencies (e.g. Pandas) but still retaining the goodness of named dimensions. ## what makes it special? * **Array Protocol Compatibility**: we are planning to make it compatible with existing array protocols and the new [Python array API standard](https://data-apis.org/array-api/latest/). * **Duck-Array Objects**: designed to wrap around multiple duck-array objects, like NumPy, Dask, Sparse, Pint, CuPy, and PyTorch. ## why are we doing this? the goal is to bridge the gap between power and simplicity, providing a lightweight alternative for scientific computing tasks that don't require the full firepower of Xarray (`DataArray` and `Dataset`). ## share your thoughts We've put together a design doc that goes into the nitty-gritty of `namedarray`. your insights could be invaluable in making this initiative a success. please give it a read and share your thoughts [here](https://github.com/pydata/xarray/discussions/8080) * **Design Doc**: [namedarray Design Document](https://github.com/pydata/xarray/blob/main/design_notes/named_array_design_d...) cross posting from [Scientifc Python Discourse](https://discuss.scientific-python.org/t/seeking-feedback-design-doc-for-name...)
I would make use of it if it was also supporting pure-numpy indices too. Pure-numpy n-dim array with indices is what I was after for a while now. The reason is exactly that - to shed heavy dependencies as pandas and have performance of pure numpy. Regards, DG
On 20 Oct 2023, at 00:51, Anderson Banihirwe <axbanihirwe@gmail.com> wrote:
:wave:t5: folks, [there has been growing interest in a lightweight array structure](https://github.com/pydata/xarray/issues/3981) that's in the same vein as [xarray's Variable](https://docs.xarray.dev/en/stable/generated/xarray.Variable.html). we've put together a design doc for `namedarray`, and we could use your feedback/input.
## what is `namedarray`?
in essence, `namedarray` aims to be a lighter version of xarray's Variable—shedding some of the heavier dependencies (e.g. Pandas) but still retaining the goodness of named dimensions.
## what makes it special?
* **Array Protocol Compatibility**: we are planning to make it compatible with existing array protocols and the new [Python array API standard](https://data-apis.org/array-api/latest/). * **Duck-Array Objects**: designed to wrap around multiple duck-array objects, like NumPy, Dask, Sparse, Pint, CuPy, and PyTorch.
## why are we doing this?
the goal is to bridge the gap between power and simplicity, providing a lightweight alternative for scientific computing tasks that don't require the full firepower of Xarray (`DataArray` and `Dataset`).
## share your thoughts
We've put together a design doc that goes into the nitty-gritty of `namedarray`. your insights could be invaluable in making this initiative a success. please give it a read and share your thoughts [here](https://github.com/pydata/xarray/discussions/8080)
* **Design Doc**: [namedarray Design Document](https://github.com/pydata/xarray/blob/main/design_notes/named_array_design_d...)
cross posting from [Scientifc Python Discourse](https://discuss.scientific-python.org/t/seeking-feedback-design-doc-for-name...) _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: dom.grigonis@gmail.com
Some historical discussions on a namedarray on the scikit-learn side: https://github.com/scikit-learn/enhancement_proposals/pull/25 Might be useful to y'all. On Fri, Oct 20, 2023 at 8:49 AM Dom Grigonis <dom.grigonis@gmail.com> wrote:
I would make use of it if it was also supporting pure-numpy indices too. Pure-numpy n-dim array with indices is what I was after for a while now. The reason is exactly that - to shed heavy dependencies as pandas and have performance of pure numpy.
Regards, DG
On 20 Oct 2023, at 00:51, Anderson Banihirwe <axbanihirwe@gmail.com> wrote:
:wave:t5: folks, [there has been growing interest in a lightweight array structure](https://github.com/pydata/xarray/issues/3981) that's in the same vein as [xarray's Variable]( https://docs.xarray.dev/en/stable/generated/xarray.Variable.html). we've put together a design doc for `namedarray`, and we could use your feedback/input.
## what is `namedarray`?
in essence, `namedarray` aims to be a lighter version of xarray's Variable—shedding some of the heavier dependencies (e.g. Pandas) but still retaining the goodness of named dimensions.
## what makes it special?
* **Array Protocol Compatibility**: we are planning to make it compatible with existing array protocols and the new [Python array API standard](https://data-apis.org/array-api/latest/). * **Duck-Array Objects**: designed to wrap around multiple duck-array objects, like NumPy, Dask, Sparse, Pint, CuPy, and PyTorch.
## why are we doing this?
the goal is to bridge the gap between power and simplicity, providing a lightweight alternative for scientific computing tasks that don't require the full firepower of Xarray (`DataArray` and `Dataset`).
## share your thoughts
We've put together a design doc that goes into the nitty-gritty of `namedarray`. your insights could be invaluable in making this initiative a success. please give it a read and share your thoughts [here]( https://github.com/pydata/xarray/discussions/8080)
* **Design Doc**: [namedarray Design Document]( https://github.com/pydata/xarray/blob/main/design_notes/named_array_design_d... )
cross posting from [Scientifc Python Discourse]( https://discuss.scientific-python.org/t/seeking-feedback-design-doc-for-name... ) _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: dom.grigonis@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: adrin.jalali@gmail.com
I think this is the right place to mention `scipp` library.
On 1 Dec 2023, at 17:36, Adrin <adrin.jalali@gmail.com> wrote:
Some historical discussions on a namedarray on the scikit-learn side: https://github.com/scikit-learn/enhancement_proposals/pull/25 <https://github.com/scikit-learn/enhancement_proposals/pull/25>
Might be useful to y'all.
On Fri, Oct 20, 2023 at 8:49 AM Dom Grigonis <dom.grigonis@gmail.com <mailto:dom.grigonis@gmail.com>> wrote: I would make use of it if it was also supporting pure-numpy indices too. Pure-numpy n-dim array with indices is what I was after for a while now. The reason is exactly that - to shed heavy dependencies as pandas and have performance of pure numpy.
Regards, DG
On 20 Oct 2023, at 00:51, Anderson Banihirwe <axbanihirwe@gmail.com <mailto:axbanihirwe@gmail.com>> wrote:
:wave:t5: folks, [there has been growing interest in a lightweight array structure](https://github.com/pydata/xarray/issues/3981 <https://github.com/pydata/xarray/issues/3981>) that's in the same vein as [xarray's Variable](https://docs.xarray.dev/en/stable/generated/xarray.Variable.html <https://docs.xarray.dev/en/stable/generated/xarray.Variable.html>). we've put together a design doc for `namedarray`, and we could use your feedback/input.
## what is `namedarray`?
in essence, `namedarray` aims to be a lighter version of xarray's Variable—shedding some of the heavier dependencies (e.g. Pandas) but still retaining the goodness of named dimensions.
## what makes it special?
* **Array Protocol Compatibility**: we are planning to make it compatible with existing array protocols and the new [Python array API standard](https://data-apis.org/array-api/latest/ <https://data-apis.org/array-api/latest/>). * **Duck-Array Objects**: designed to wrap around multiple duck-array objects, like NumPy, Dask, Sparse, Pint, CuPy, and PyTorch.
## why are we doing this?
the goal is to bridge the gap between power and simplicity, providing a lightweight alternative for scientific computing tasks that don't require the full firepower of Xarray (`DataArray` and `Dataset`).
## share your thoughts
We've put together a design doc that goes into the nitty-gritty of `namedarray`. your insights could be invaluable in making this initiative a success. please give it a read and share your thoughts [here](https://github.com/pydata/xarray/discussions/8080 <https://github.com/pydata/xarray/discussions/8080>)
* **Design Doc**: [namedarray Design Document](https://github.com/pydata/xarray/blob/main/design_notes/named_array_design_d... <https://github.com/pydata/xarray/blob/main/design_notes/named_array_design_doc.md>)
cross posting from [Scientifc Python Discourse](https://discuss.scientific-python.org/t/seeking-feedback-design-doc-for-name... <https://discuss.scientific-python.org/t/seeking-feedback-design-doc-for-namedarray-a-lightweight-array-data-structure-with-named-dimensions/841>) _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org <mailto:numpy-discussion@python.org> To unsubscribe send an email to numpy-discussion-leave@python.org <mailto:numpy-discussion-leave@python.org> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/> Member address: dom.grigonis@gmail.com <mailto:dom.grigonis@gmail.com>
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org <mailto:numpy-discussion@python.org> To unsubscribe send an email to numpy-discussion-leave@python.org <mailto:numpy-discussion-leave@python.org> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/> Member address: adrin.jalali@gmail.com <mailto:adrin.jalali@gmail.com> _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: dom.grigonis@gmail.com
LArray might also be useful to look at. I think there was a time when it didn't use pandas, but it does have it as a dependency now. https://github.com/larray-project/larray I think this would be a really useful endeavor. The CDF data model is extremely useful, and adopting even a piece of it would bring great benefits. I find it particularly useful in always having access to my coordinates, especially when debugging a problem with my data, One thing that can make things messy is with designing code to accept these as inputs and outputs. Do you explicitly pass each of the data along with explicitly passing in coordinate variables, or do you just let the coordinates "come along for the ride"? If they come along implicitly, should functions require additional parameters for the names of the coordinates, or should the function require that the dimensions have particular names? I've been using XArray since back when it was called "xray", and I still don't have concrete answers to these questions. Hopefully, wider adoption will help bring inspirations for better design principles. Cheers! Ben Root On Fri, Dec 1, 2023 at 12:51 PM Dom Grigonis <dom.grigonis@gmail.com> wrote:
I think this is the right place to mention `scipp` library.
On 1 Dec 2023, at 17:36, Adrin <adrin.jalali@gmail.com> wrote:
Some historical discussions on a namedarray on the scikit-learn side: https://github.com/scikit-learn/enhancement_proposals/pull/25
Might be useful to y'all.
On Fri, Oct 20, 2023 at 8:49 AM Dom Grigonis <dom.grigonis@gmail.com> wrote:
I would make use of it if it was also supporting pure-numpy indices too. Pure-numpy n-dim array with indices is what I was after for a while now. The reason is exactly that - to shed heavy dependencies as pandas and have performance of pure numpy.
Regards, DG
On 20 Oct 2023, at 00:51, Anderson Banihirwe <axbanihirwe@gmail.com> wrote:
:wave:t5: folks, [there has been growing interest in a lightweight array structure](https://github.com/pydata/xarray/issues/3981) that's in the same vein as [xarray's Variable]( https://docs.xarray.dev/en/stable/generated/xarray.Variable.html). we've put together a design doc for `namedarray`, and we could use your feedback/input.
## what is `namedarray`?
in essence, `namedarray` aims to be a lighter version of xarray's Variable—shedding some of the heavier dependencies (e.g. Pandas) but still retaining the goodness of named dimensions.
## what makes it special?
* **Array Protocol Compatibility**: we are planning to make it compatible with existing array protocols and the new [Python array API standard](https://data-apis.org/array-api/latest/). * **Duck-Array Objects**: designed to wrap around multiple duck-array objects, like NumPy, Dask, Sparse, Pint, CuPy, and PyTorch.
## why are we doing this?
the goal is to bridge the gap between power and simplicity, providing a lightweight alternative for scientific computing tasks that don't require the full firepower of Xarray (`DataArray` and `Dataset`).
## share your thoughts
We've put together a design doc that goes into the nitty-gritty of `namedarray`. your insights could be invaluable in making this initiative a success. please give it a read and share your thoughts [here]( https://github.com/pydata/xarray/discussions/8080)
* **Design Doc**: [namedarray Design Document]( https://github.com/pydata/xarray/blob/main/design_notes/named_array_design_d... )
cross posting from [Scientifc Python Discourse]( https://discuss.scientific-python.org/t/seeking-feedback-design-doc-for-name... ) _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: dom.grigonis@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: adrin.jalali@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: dom.grigonis@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ben.v.root@gmail.com
participants (4)
-
Adrin
-
Anderson Banihirwe
-
Benjamin Root
-
Dom Grigonis