👋Long time numpy user, and big fan of all your work TL; DR - there's a `bit_count` method on numpy scalars, and Python ints, I'm advocating for a `ufunc` `bit_count` as has been implemented in this PR: https://github.com/numpy/numpy/pull/21429 A number of people have requested this as a numpy feature and there's a lot of great discussion at this issue https://github.com/numpy/numpy/issues/16325 A bit count comes up in certain similarity situations (like hamming distance). This involves an xor with numpy arrays and a bit count, where the bit count is currently the bottleneck by a few orders of magnitude in my local benchmarking. Currently there's a number of not particularly fast, workarounds for doing this with numpy arrays, like the bit-twiddling solutions here https://stackoverflow.com/a/68943135/8123 https://stackoverflow.com/a/109025/8123 So I'd love to be able to continue the work of bit_count -> numpy array ufunc to get performance gains at the array level 🙏
Thanks for bringing this up again. The Python method exists and it seems like relatively basic functionality. Overall, I am slightly in favor of adding the ufunc. So if nobody voices an opinion that it doesn't seem a good fit for NumPy, I would be happy to move forward with it. - Sebastian PS: One of my main concern would be if we were to add many bitwise functions, in which case a `bitwise` namespace might be nice. But I am not convinced that should stop us here. On Thu, 2022-11-24 at 15:56 +0000, Doug Turnbull wrote:
👋Long time numpy user, and big fan of all your work
TL; DR - there's a `bit_count` method on numpy scalars, and Python ints, I'm advocating for a `ufunc` `bit_count` as has been implemented in this PR:
https://github.com/numpy/numpy/pull/21429
A number of people have requested this as a numpy feature and there's a lot of great discussion at this issue
https://github.com/numpy/numpy/issues/16325
A bit count comes up in certain similarity situations (like hamming distance). This involves an xor with numpy arrays and a bit count, where the bit count is currently the bottleneck by a few orders of magnitude in my local benchmarking. Currently there's a number of not particularly fast, workarounds for doing this with numpy arrays, like the bit-twiddling solutions here
https://stackoverflow.com/a/68943135/8123 https://stackoverflow.com/a/109025/8123
So I'd love to be able to continue the work of bit_count -> numpy array ufunc to get performance gains at the array level 🙏 _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
On Fri, Nov 25, 2022 at 08:09:02PM +0100, Sebastian Berg wrote:
Thanks for bringing this up again. The Python method exists and it seems like relatively basic functionality.
Overall, I am slightly in favor of adding the ufunc. So if nobody voices an opinion that it doesn't seem a good fit for NumPy, I would be happy to move forward with it.
- Sebastian
PS: One of my main concern would be if we were to add many bitwise functions, in which case a `bitwise` namespace might be nice. But I am not convinced that should stop us here.
Technically speaking, bitwise_and, birwise_or, bitwise_xor and bitwise_not already exist and popcount is widely spread, it already has its compiler builtin under the name of __builtin_popcount
On Fri, Nov 25, 2022 at 9:55 PM Serge Guelton <sergesanspaille@free.fr> wrote:
On Fri, Nov 25, 2022 at 08:09:02PM +0100, Sebastian Berg wrote:
Thanks for bringing this up again. The Python method exists and it seems like relatively basic functionality.
Overall, I am slightly in favor of adding the ufunc. So if nobody voices an opinion that it doesn't seem a good fit for NumPy, I would be happy to move forward with it.
It seems like there is enough demand, so +0.5 from me.
PS: One of my main concern would be if we were to add many bitwise functions, in which case a `bitwise` namespace might be nice. But I am not convinced that should stop us here.
Technically speaking, bitwise_and, birwise_or, bitwise_xor and bitwise_not already exist and popcount is widely spread, it already has its compiler builtin under the name of __builtin_popcount
Those four `bitwise_*` functions also came to mind for me. I'll also add that Python has six bit-wise operators ( https://wiki.python.org/moin/BitwiseOperators), and because of that the array API standard implements `bitwise_left_shift` and `bitwise_right_shift` in addition to the four functions that NumPy has. So it looks to me like this new ufunc should be called `bitwise_count` rather than `bit_count`. Cheers, Ralf
Hi all, Thanks for the suggestions! I have gotten the PR to a working state with UT on top of the latest main (PR <https://github.com/numpy/numpy/pull/21429>). So it looks to me like this new ufunc should be called `bitwise_count`
rather than `bit_count`.
This does sound like a good idea, but would it confuse the users given the scalar version is called `bit_count`? We could change/add an alias for scalar bit_count to `bitwise_count` but that would be different from the Python `bit_count <https://docs.python.org/3/library/stdtypes.html#int.bit_count>` added in 3.10. Any idea on how to proceed? I'm ok with calling it `bitwise_count` or `bit_count`. Thanks, Ganesh On Thu, Dec 1, 2022 at 3:20 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Nov 25, 2022 at 9:55 PM Serge Guelton <sergesanspaille@free.fr> wrote:
On Fri, Nov 25, 2022 at 08:09:02PM +0100, Sebastian Berg wrote:
Thanks for bringing this up again. The Python method exists and it seems like relatively basic functionality.
Overall, I am slightly in favor of adding the ufunc. So if nobody voices an opinion that it doesn't seem a good fit for NumPy, I would be happy to move forward with it.
It seems like there is enough demand, so +0.5 from me.
PS: One of my main concern would be if we were to add many bitwise functions, in which case a `bitwise` namespace might be nice. But I am not convinced that should stop us here.
Technically speaking, bitwise_and, birwise_or, bitwise_xor and bitwise_not already exist and popcount is widely spread, it already has its compiler builtin under the name of __builtin_popcount
Those four `bitwise_*` functions also came to mind for me. I'll also add that Python has six bit-wise operators ( https://wiki.python.org/moin/BitwiseOperators), and because of that the array API standard implements `bitwise_left_shift` and `bitwise_right_shift` in addition to the four functions that NumPy has.
So it looks to me like this new ufunc should be called `bitwise_count` rather than `bit_count`.
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ganesh3597@gmail.com
On Sun, Dec 4, 2022 at 6:30 PM Ganesh Kathiresan <ganesh3597@gmail.com> wrote:
Hi all,
Thanks for the suggestions! I have gotten the PR to a working state with UT on top of the latest main (PR <https://github.com/numpy/numpy/pull/21429>).
So it looks to me like this new ufunc should be called `bitwise_count`
rather than `bit_count`.
This does sound like a good idea, but would it confuse the users given the scalar version is called `bit_count`? We could change/add an alias for scalar bit_count to `bitwise_count` but that would be different from the Python `bit_count <https://docs.python.org/3/library/stdtypes.html#int.bit_count>` added in 3.10. Any idea on how to proceed? I'm ok with calling it `bitwise_count` or `bit_count`.
I don't think it would confuse users. This is not a feature for the beginner/average user, it will have few users right now, and as long as the correspondence is mentioned in the docstring this should be discoverable enough. I'd much prefer no alias, we already have way too many of those and most of them are only noise at this point. Cheers, Ralf
Thanks, Ganesh
On Thu, Dec 1, 2022 at 3:20 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Nov 25, 2022 at 9:55 PM Serge Guelton <sergesanspaille@free.fr> wrote:
On Fri, Nov 25, 2022 at 08:09:02PM +0100, Sebastian Berg wrote:
Thanks for bringing this up again. The Python method exists and it seems like relatively basic functionality.
Overall, I am slightly in favor of adding the ufunc. So if nobody voices an opinion that it doesn't seem a good fit for NumPy, I would be happy to move forward with it.
It seems like there is enough demand, so +0.5 from me.
PS: One of my main concern would be if we were to add many bitwise functions, in which case a `bitwise` namespace might be nice. But I am not convinced that should stop us here.
Technically speaking, bitwise_and, birwise_or, bitwise_xor and bitwise_not already exist and popcount is widely spread, it already has its compiler builtin under the name of __builtin_popcount
Those four `bitwise_*` functions also came to mind for me. I'll also add that Python has six bit-wise operators ( https://wiki.python.org/moin/BitwiseOperators), and because of that the array API standard implements `bitwise_left_shift` and `bitwise_right_shift` in addition to the four functions that NumPy has.
So it looks to me like this new ufunc should be called `bitwise_count` rather than `bit_count`.
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ganesh3597@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ralf.gommers@gmail.com
as long as the correspondence is mentioned in the docstring this should be discoverable enough
Ah ok, that's a good point. I will make the change. Thanks! ~Ganesh On Mon, Dec 5, 2022 at 3:21 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Sun, Dec 4, 2022 at 6:30 PM Ganesh Kathiresan <ganesh3597@gmail.com> wrote:
Hi all,
Thanks for the suggestions! I have gotten the PR to a working state with UT on top of the latest main (PR <https://github.com/numpy/numpy/pull/21429>).
So it looks to me like this new ufunc should be called `bitwise_count`
rather than `bit_count`.
This does sound like a good idea, but would it confuse the users given the scalar version is called `bit_count`? We could change/add an alias for scalar bit_count to `bitwise_count` but that would be different from the Python `bit_count <https://docs.python.org/3/library/stdtypes.html#int.bit_count>` added in 3.10. Any idea on how to proceed? I'm ok with calling it `bitwise_count` or `bit_count`.
I don't think it would confuse users. This is not a feature for the beginner/average user, it will have few users right now, and as long as the correspondence is mentioned in the docstring this should be discoverable enough. I'd much prefer no alias, we already have way too many of those and most of them are only noise at this point.
Cheers, Ralf
Thanks, Ganesh
On Thu, Dec 1, 2022 at 3:20 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Nov 25, 2022 at 9:55 PM Serge Guelton <sergesanspaille@free.fr> wrote:
On Fri, Nov 25, 2022 at 08:09:02PM +0100, Sebastian Berg wrote:
Thanks for bringing this up again. The Python method exists and it seems like relatively basic functionality.
Overall, I am slightly in favor of adding the ufunc. So if nobody voices an opinion that it doesn't seem a good fit for NumPy, I would be happy to move forward with it.
It seems like there is enough demand, so +0.5 from me.
PS: One of my main concern would be if we were to add many bitwise functions, in which case a `bitwise` namespace might be nice. But I am not convinced that should stop us here.
Technically speaking, bitwise_and, birwise_or, bitwise_xor and bitwise_not already exist and popcount is widely spread, it already has its compiler builtin under the name of __builtin_popcount
Those four `bitwise_*` functions also came to mind for me. I'll also add that Python has six bit-wise operators ( https://wiki.python.org/moin/BitwiseOperators), and because of that the array API standard implements `bitwise_left_shift` and `bitwise_right_shift` in addition to the four functions that NumPy has.
So it looks to me like this new ufunc should be called `bitwise_count` rather than `bit_count`.
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ganesh3597@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ralf.gommers@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ganesh3597@gmail.com
Hi all, Thanks for all the great work on this PR on using the CPU instruction for bitcounts. This will be really useful for hamming distance like calculations. I wanted to check if there was anything else needed to merge this PR? It seems like there was a lot of good feedback addressed. https://github.com/numpy/numpy/pull/21429 Thanks -Doug On Sun, Dec 4, 2022 at 10:00 PM Ganesh Kathiresan <ganesh3597@gmail.com> wrote:
as long as the correspondence is mentioned in the docstring this should
be discoverable enough
Ah ok, that's a good point. I will make the change. Thanks!
~Ganesh
On Mon, Dec 5, 2022 at 3:21 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Sun, Dec 4, 2022 at 6:30 PM Ganesh Kathiresan <ganesh3597@gmail.com> wrote:
Hi all,
Thanks for the suggestions! I have gotten the PR to a working state with UT on top of the latest main (PR <https://github.com/numpy/numpy/pull/21429>).
So it looks to me like this new ufunc should be called `bitwise_count`
rather than `bit_count`.
This does sound like a good idea, but would it confuse the users given the scalar version is called `bit_count`? We could change/add an alias for scalar bit_count to `bitwise_count` but that would be different from the Python `bit_count <https://docs.python.org/3/library/stdtypes.html#int.bit_count>` added in 3.10. Any idea on how to proceed? I'm ok with calling it `bitwise_count` or `bit_count`.
I don't think it would confuse users. This is not a feature for the beginner/average user, it will have few users right now, and as long as the correspondence is mentioned in the docstring this should be discoverable enough. I'd much prefer no alias, we already have way too many of those and most of them are only noise at this point.
Cheers, Ralf
Thanks, Ganesh
On Thu, Dec 1, 2022 at 3:20 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Nov 25, 2022 at 9:55 PM Serge Guelton <sergesanspaille@free.fr> wrote:
On Fri, Nov 25, 2022 at 08:09:02PM +0100, Sebastian Berg wrote:
Thanks for bringing this up again. The Python method exists and it seems like relatively basic functionality.
Overall, I am slightly in favor of adding the ufunc. So if nobody voices an opinion that it doesn't seem a good fit for NumPy, I would be happy to move forward with it.
It seems like there is enough demand, so +0.5 from me.
PS: One of my main concern would be if we were to add many bitwise functions, in which case a `bitwise` namespace might be nice. But I am not convinced that should stop us here.
Technically speaking, bitwise_and, birwise_or, bitwise_xor and bitwise_not already exist and popcount is widely spread, it already has its compiler builtin under the name of __builtin_popcount
Those four `bitwise_*` functions also came to mind for me. I'll also add that Python has six bit-wise operators ( https://wiki.python.org/moin/BitwiseOperators), and because of that the array API standard implements `bitwise_left_shift` and `bitwise_right_shift` in addition to the four functions that NumPy has.
So it looks to me like this new ufunc should be called `bitwise_count` rather than `bit_count`.
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ganesh3597@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ralf.gommers@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ganesh3597@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: softwaredoug@gmail.com
participants (5)
-
Doug Turnbull
-
Ganesh Kathiresan
-
Ralf Gommers
-
Sebastian Berg
-
Serge Guelton