![](https://secure.gravatar.com/avatar/680652ec4feb1969c39cfba008214ea9.jpg?s=120&d=mm&r=g)
👋Long time numpy user, and big fan of all your work TL; DR - there's a `bit_count` method on numpy scalars, and Python ints, I'm advocating for a `ufunc` `bit_count` as has been implemented in this PR: https://github.com/numpy/numpy/pull/21429 A number of people have requested this as a numpy feature and there's a lot of great discussion at this issue https://github.com/numpy/numpy/issues/16325 A bit count comes up in certain similarity situations (like hamming distance). This involves an xor with numpy arrays and a bit count, where the bit count is currently the bottleneck by a few orders of magnitude in my local benchmarking. Currently there's a number of not particularly fast, workarounds for doing this with numpy arrays, like the bit-twiddling solutions here https://stackoverflow.com/a/68943135/8123 https://stackoverflow.com/a/109025/8123 So I'd love to be able to continue the work of bit_count -> numpy array ufunc to get performance gains at the array level 🙏
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
Thanks for bringing this up again. The Python method exists and it seems like relatively basic functionality. Overall, I am slightly in favor of adding the ufunc. So if nobody voices an opinion that it doesn't seem a good fit for NumPy, I would be happy to move forward with it. - Sebastian PS: One of my main concern would be if we were to add many bitwise functions, in which case a `bitwise` namespace might be nice. But I am not convinced that should stop us here. On Thu, 2022-11-24 at 15:56 +0000, Doug Turnbull wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Fri, Nov 25, 2022 at 9:55 PM Serge Guelton <sergesanspaille@free.fr> wrote:
It seems like there is enough demand, so +0.5 from me.
Those four `bitwise_*` functions also came to mind for me. I'll also add that Python has six bit-wise operators ( https://wiki.python.org/moin/BitwiseOperators), and because of that the array API standard implements `bitwise_left_shift` and `bitwise_right_shift` in addition to the four functions that NumPy has. So it looks to me like this new ufunc should be called `bitwise_count` rather than `bit_count`. Cheers, Ralf
![](https://secure.gravatar.com/avatar/47e942172ef84cf26a8b419452549773.jpg?s=120&d=mm&r=g)
Hi all, Thanks for the suggestions! I have gotten the PR to a working state with UT on top of the latest main (PR <https://github.com/numpy/numpy/pull/21429>). So it looks to me like this new ufunc should be called `bitwise_count`
rather than `bit_count`.
This does sound like a good idea, but would it confuse the users given the scalar version is called `bit_count`? We could change/add an alias for scalar bit_count to `bitwise_count` but that would be different from the Python `bit_count <https://docs.python.org/3/library/stdtypes.html#int.bit_count>` added in 3.10. Any idea on how to proceed? I'm ok with calling it `bitwise_count` or `bit_count`. Thanks, Ganesh On Thu, Dec 1, 2022 at 3:20 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Dec 4, 2022 at 6:30 PM Ganesh Kathiresan <ganesh3597@gmail.com> wrote:
I don't think it would confuse users. This is not a feature for the beginner/average user, it will have few users right now, and as long as the correspondence is mentioned in the docstring this should be discoverable enough. I'd much prefer no alias, we already have way too many of those and most of them are only noise at this point. Cheers, Ralf
![](https://secure.gravatar.com/avatar/680652ec4feb1969c39cfba008214ea9.jpg?s=120&d=mm&r=g)
Hi all, Thanks for all the great work on this PR on using the CPU instruction for bitcounts. This will be really useful for hamming distance like calculations. I wanted to check if there was anything else needed to merge this PR? It seems like there was a lot of good feedback addressed. https://github.com/numpy/numpy/pull/21429 Thanks -Doug On Sun, Dec 4, 2022 at 10:00 PM Ganesh Kathiresan <ganesh3597@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
Thanks for bringing this up again. The Python method exists and it seems like relatively basic functionality. Overall, I am slightly in favor of adding the ufunc. So if nobody voices an opinion that it doesn't seem a good fit for NumPy, I would be happy to move forward with it. - Sebastian PS: One of my main concern would be if we were to add many bitwise functions, in which case a `bitwise` namespace might be nice. But I am not convinced that should stop us here. On Thu, 2022-11-24 at 15:56 +0000, Doug Turnbull wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Fri, Nov 25, 2022 at 9:55 PM Serge Guelton <sergesanspaille@free.fr> wrote:
It seems like there is enough demand, so +0.5 from me.
Those four `bitwise_*` functions also came to mind for me. I'll also add that Python has six bit-wise operators ( https://wiki.python.org/moin/BitwiseOperators), and because of that the array API standard implements `bitwise_left_shift` and `bitwise_right_shift` in addition to the four functions that NumPy has. So it looks to me like this new ufunc should be called `bitwise_count` rather than `bit_count`. Cheers, Ralf
![](https://secure.gravatar.com/avatar/47e942172ef84cf26a8b419452549773.jpg?s=120&d=mm&r=g)
Hi all, Thanks for the suggestions! I have gotten the PR to a working state with UT on top of the latest main (PR <https://github.com/numpy/numpy/pull/21429>). So it looks to me like this new ufunc should be called `bitwise_count`
rather than `bit_count`.
This does sound like a good idea, but would it confuse the users given the scalar version is called `bit_count`? We could change/add an alias for scalar bit_count to `bitwise_count` but that would be different from the Python `bit_count <https://docs.python.org/3/library/stdtypes.html#int.bit_count>` added in 3.10. Any idea on how to proceed? I'm ok with calling it `bitwise_count` or `bit_count`. Thanks, Ganesh On Thu, Dec 1, 2022 at 3:20 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Dec 4, 2022 at 6:30 PM Ganesh Kathiresan <ganesh3597@gmail.com> wrote:
I don't think it would confuse users. This is not a feature for the beginner/average user, it will have few users right now, and as long as the correspondence is mentioned in the docstring this should be discoverable enough. I'd much prefer no alias, we already have way too many of those and most of them are only noise at this point. Cheers, Ralf
![](https://secure.gravatar.com/avatar/680652ec4feb1969c39cfba008214ea9.jpg?s=120&d=mm&r=g)
Hi all, Thanks for all the great work on this PR on using the CPU instruction for bitcounts. This will be really useful for hamming distance like calculations. I wanted to check if there was anything else needed to merge this PR? It seems like there was a lot of good feedback addressed. https://github.com/numpy/numpy/pull/21429 Thanks -Doug On Sun, Dec 4, 2022 at 10:00 PM Ganesh Kathiresan <ganesh3597@gmail.com> wrote:
participants (5)
-
Doug Turnbull
-
Ganesh Kathiresan
-
Ralf Gommers
-
Sebastian Berg
-
Serge Guelton