Writing an array subclass in a compiled language with proper dispatch
Hi all, I am a long time user of astropy.units, which allows one to define quantities with physical units as follows:
from astropy import units as u 10 << u.cm <Quantity 10. cm> np.sqrt(4 << u.m ** 2) <Quantity 2. m> ([1, 1, 0] << u.m) @ ([0, 10, 20] << u.cm / u.s) <Quantity 10. cm m / s> (([1, 1, 0] << u.m) * ([0, 10, 20] << u.cm / u.s)).to(u.m ** 2 / u.s) <Quantity [0. , 0.1, 0. ] m2 / s>
The mechanism works by subclassing numpy.ndarray and leveraging __array_function__ support aka NEP 18. Internally it is something like this:
v = np.array(10, dtype=np.float64, copy=False, order=None, subok=True, ndmin=0) vu = v.view(u.Quantity) vu._set_unit(u.cm) vu <Quantity 10. cm>
However, over the years I have been constantly annoyed by the fact that it is tremendously slow. I'm not critizing Astropy devs, the problem seems objectively difficult: although some code paths could be optimized at the cost of losing some syntactic sugar or breaking backwards compatibility, `isinstance` calls and introspection in general are slow. Setting aside the question of trying to make astropy.units faster (which may or may not be possible), I was thinking how feasible could it be to implement something similar, but using a compiled language instead (C, Cython, Rust, whatever) and leveraging "modern" dispatch mechanisms. But after reading about NEP 18, NEP 47, uarray, and various pull requests and issues here and there (https://labs.quansight.org/blog/2021/11/pydata-extensibility-vision/ and https://github.com/scipy/scipy/issues/10204#issuecomment-787067947 among others) I don't fully grasp the differences between the approaches, and I don't know if what I am proposing is feasible at all. Since IIUC the numpy function or ufunc is passed to __array_function__ and __array_ufunc__ respectively, I am not sure how would that interact with the code being in a foreign language (I assume the NumPy C API would have to be used). If folks have advice, ideas, or suggestions for a direction, I'll be happy to read them. Best!
There are no C-level APIs for __array_function__ or __array_ufunc__, so yes, at a high-level Python methods will be invoked by NumPy. That said, NumPy's logic for handling __array_function__ and __array_ufunc__ methods is written in highly optimized C. If you wrote your own __array_function__ and __array_ufunc__ methods on Quantity using C, and there should be very little overhead from Python. I would guess you might see a considerable speed-ups due to moving highly dynamic Python logic into a compiled language. On Mon, Jan 10, 2022 at 12:56 PM Juan Luis Cano Rodríguez < hello@juanlu.space> wrote:
Hi all,
I am a long time user of astropy.units, which allows one to define quantities with physical units as follows:
from astropy import units as u 10 << u.cm <Quantity 10. cm> np.sqrt(4 << u.m ** 2) <Quantity 2. m> ([1, 1, 0] << u.m) @ ([0, 10, 20] << u.cm / u.s) <Quantity 10. cm m / s> (([1, 1, 0] << u.m) * ([0, 10, 20] << u.cm / u.s)).to(u.m ** 2 / u.s) <Quantity [0. , 0.1, 0. ] m2 / s>
The mechanism works by subclassing numpy.ndarray and leveraging __array_function__ support aka NEP 18. Internally it is something like this:
v = np.array(10, dtype=np.float64, copy=False, order=None, subok=True, ndmin=0) vu = v.view(u.Quantity) vu._set_unit(u.cm) vu <Quantity 10. cm>
However, over the years I have been constantly annoyed by the fact that it is tremendously slow. I'm not critizing Astropy devs, the problem seems objectively difficult: although some code paths could be optimized at the cost of losing some syntactic sugar or breaking backwards compatibility, `isinstance` calls and introspection in general are slow.
Setting aside the question of trying to make astropy.units faster (which may or may not be possible), I was thinking how feasible could it be to implement something similar, but using a compiled language instead (C, Cython, Rust, whatever) and leveraging "modern" dispatch mechanisms. But after reading about NEP 18, NEP 47, uarray, and various pull requests and issues here and there ( https://labs.quansight.org/blog/2021/11/pydata-extensibility-vision/ and https://github.com/scipy/scipy/issues/10204#issuecomment-787067947 among others) I don't fully grasp the differences between the approaches, and I don't know if what I am proposing is feasible at all. Since IIUC the numpy function or ufunc is passed to __array_function__ and __array_ufunc__ respectively, I am not sure how would that interact with the code being in a foreign language (I assume the NumPy C API would have to be used).
If folks have advice, ideas, or suggestions for a direction, I'll be happy to read them.
Best! _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: shoyer@gmail.com
On Mon, 2022-01-10 at 15:25 -0800, Stephan Hoyer wrote:
There are no C-level APIs for __array_function__ or __array_ufunc__, so yes, at a high-level Python methods will be invoked by NumPy.
That said, NumPy's logic for handling __array_function__ and __array_ufunc__ methods is written in highly optimized C. If you wrote your own __array_function__ and __array_ufunc__ methods on Quantity using C, and there should be very little overhead from Python. I would guess you might see a considerable speed-ups due to moving highly dynamic Python logic into a compiled language.
There are some slowish Python-C interface boundaries that could be optimized away [1], although we should first do the generic optimizations that I mentioned before the break (at least for `__array_function__`) before considering that. Another approach is now becoming available (and experimenting on it could hasten it a lot), though, if that interests you. That is to use new user defined DTypes: If you use a dtype that holds the unit, there is no wrapping of the NumPy array at all! If done right, you stay within C and the overhead should be effectively indistinguishable from normal NumPy array operations. There is a very minimal proof of concept here (not designed to be fast or good): https://github.com/seberg/experimental_user_dtypes This works with NumPy main branch (albeit there is a small tweak needed right now [2]). I was hoping to look into a somewhat more full featured prototype soonish (the fact that it uses Cython turned out a bad idea). Currently, there is mainly one larger feature missing: A Unit DType should be able to call the existing NumPy loops conveniently (e.g. use the normal multiplication function but tag on unit handling). Cheers, Sebastian [1] Basically, just hopping back from C to Python (even if just to call C) is a bit sluggish when you look at small to mid-sized array operations. That could probably be avoided, but needs small API extension I think. I am not sure how worthwhile that is. [2] Unfortunately, I a merge conflict or so happened. So if you want to try that example code, you currently need to change one line in `numpy/include/numpy/experimental_dtype_api.h` to read: #define __EXPERIMENTAL_DTYPE_VERSION 3 (The version should be 3, not 2.)
On Mon, Jan 10, 2022 at 12:56 PM Juan Luis Cano Rodríguez < hello@juanlu.space> wrote:
Hi all,
I am a long time user of astropy.units, which allows one to define quantities with physical units as follows:
from astropy import units as u 10 << u.cm <Quantity 10. cm> np.sqrt(4 << u.m ** 2) <Quantity 2. m> ([1, 1, 0] << u.m) @ ([0, 10, 20] << u.cm / u.s) <Quantity 10. cm m / s> (([1, 1, 0] << u.m) * ([0, 10, 20] << u.cm / u.s)).to(u.m ** 2 / u.s) <Quantity [0. , 0.1, 0. ] m2 / s>
The mechanism works by subclassing numpy.ndarray and leveraging __array_function__ support aka NEP 18. Internally it is something like this:
v = np.array(10, dtype=np.float64, copy=False, order=None, subok=True, ndmin=0) vu = v.view(u.Quantity) vu._set_unit(u.cm) vu <Quantity 10. cm>
However, over the years I have been constantly annoyed by the fact that it is tremendously slow. I'm not critizing Astropy devs, the problem seems objectively difficult: although some code paths could be optimized at the cost of losing some syntactic sugar or breaking backwards compatibility, `isinstance` calls and introspection in general are slow.
Setting aside the question of trying to make astropy.units faster (which may or may not be possible), I was thinking how feasible could it be to implement something similar, but using a compiled language instead (C, Cython, Rust, whatever) and leveraging "modern" dispatch mechanisms. But after reading about NEP 18, NEP 47, uarray, and various pull requests and issues here and there ( https://labs.quansight.org/blog/2021/11/pydata-extensibility-vision/ and https://github.com/scipy/scipy/issues/10204#issuecomment-787067947 among others) I don't fully grasp the differences between the approaches, and I don't know if what I am proposing is feasible at all. Since IIUC the numpy function or ufunc is passed to __array_function__ and __array_ufunc__ respectively, I am not sure how would that interact with the code being in a foreign language (I assume the NumPy C API would have to be used).
If folks have advice, ideas, or suggestions for a direction, I'll be happy to read them.
Best! _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: shoyer@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
On January 11, 2022, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2022-01-10 at 15:25 -0800, Stephan Hoyer wrote:
There are no C-level APIs for __array_function__ or __array_ufunc__, so yes, at a high-level Python methods will be invoked by NumPy. That said, NumPy's logic for handling __array_function__ and __array_ufunc__ methods is written in highly optimized C. If you wrote your own __array_function__ and __array_ufunc__ methods on Quantity using C, and there should be very little overhead from Python. I would guess you might see a considerable speed-ups due to moving highly dynamic Python logic into a compiled language.
There are some slowish Python-C interface boundaries that could be optimized away [1], although we should first do the generic optimizations that I mentioned before the break (at least for `__array_function__`) before considering that.
Another approach is now becoming available (and experimenting on it could hasten it a lot), though, if that interests you.
That is to use new user defined DTypes: If you use a dtype that holds the unit, there is no wrapping of the NumPy array at all! If done right, you stay within C and the overhead should be effectively indistinguishable from normal NumPy array operations. There is a very minimal proof of concept here (not designed to be fast or good):
https://github.com/seberg/experimental_user_dtypes
This works with NumPy main branch (albeit there is a small tweak needed right now [2]).
Thanks both for the insight! The dtype solution looks promising. I tried to make it work without success, opened an issue in your repository to continue the conversation there. Best, Juan Luis
I was hoping to look into a somewhat more full featured prototype soonish (the fact that it uses Cython turned out a bad idea).
Currently, there is mainly one larger feature missing: A Unit DType should be able to call the existing NumPy loops conveniently (e.g. use the normal multiplication function but tag on unit handling).
Cheers,
Sebastian
[1] Basically, just hopping back from C to Python (even if just to call C) is a bit sluggish when you look at small to mid-sized array operations. That could probably be avoided, but needs small API extension I think. I am not sure how worthwhile that is.
[2] Unfortunately, I a merge conflict or so happened. So if you want to try that example code, you currently need to change one line in `numpy/include/numpy/experimental_dtype_api.h` to read:
#define __EXPERIMENTAL_DTYPE_VERSION 3
(The version should be 3, not 2.)
On Mon, Jan 10, 2022 at 12:56 PM Juan Luis Cano Rodríguez < hello@juanlu.space> wrote:
from astropy import units as u 10 << u.cm <Quantity 10. cm> np.sqrt(4 << u.m ** 2) <Quantity 2. m> ([1, 1, 0] << u.m) @ ([0, 10, 20] << u.cm / u.s) <Quantity 10. cm m / s> (([1, 1, 0] << u.m) * ([0, 10, 20] << u.cm / u.s)).to(u.m ** 2 / u.s) <Quantity [0. , 0.1, 0. ] m2 / s> The mechanism works by subclassing numpy.ndarray and leveraging __array_function__ support aka NEP 18. Internally it is something
v = np.array(10, dtype=np.float64, copy=False, order=None, subok=True, ndmin=0) vu = v.view(u.Quantity) vu._set_unit(u.cm) vu <Quantity 10. cm> However, over the years I have been constantly annoyed by the fact
Hi all, I am a long time user of astropy.units, which allows one to define quantities with physical units as follows: like this: that it is tremendously slow. I'm not critizing Astropy devs, the problem seems objectively difficult: although some code paths could be optimized at the cost of losing some syntactic sugar or breaking backwards compatibility, `isinstance` calls and introspection in general are slow. Setting aside the question of trying to make astropy.units faster (which may or may not be possible), I was thinking how feasible could it be to implement something similar, but using a compiled language instead (C, Cython, Rust, whatever) and leveraging "modern" dispatch mechanisms. But after reading about NEP 18, NEP 47, uarray, and various pull requests and issues here and there ( https://labs.quansight.org/blog/2021/11/pydata-extensibility- vision/ and https://github.com/scipy/scipy/issues/10204#issuecomment- 787067947 among others) I don't fully grasp the differences between the approaches, and I don't know if what I am proposing is feasible at all. Since IIUC the numpy function or ufunc is passed to __array_function__ and __array_ufunc__ respectively, I am not sure how would that interact with the code being in a foreign language (I assume the NumPy C API would have to be used). If folks have advice, ideas, or suggestions for a direction, I'll be happy to read them. Best! _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy- discussion.python.org/ Member address: shoyer@gmail.com
NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: hello@juanlu.space
participants (3)
-
Juan Luis Cano Rodríguez -
Sebastian Berg -
Stephan Hoyer