Proposal for adding bit_count

Hi All, I am working <https://github.com/numpy/numpy/pull/19355> on a new UFunc, ` bit_count <https://en.wikipedia.org/wiki/Hamming_weight>` (popcount in other languages) that aims to count the number of 1-bits in the absolute value of an Integer. Implementation ---------------------------------- The primary reference for the implementation is CountBitsSetParallel <http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel>. Here we take 12 operations to achieve the result which is the same as the lookup table method but does not suffer from memory issues or cache misses. The implementation is aimed at unsigned integers, absolute value of signed integers and objects that support the operation. Usage -------------- >>> np.bit_count(1023) 10 >>> a = np.array([2**i - 1 for i in range(16)]) >>> np.bit_count(a) array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]) >>> np.int32(1023).bit_count() 10 Notes ------------- 1. Python has included this method here <https://github.com/python/cpython/commit/8bd216dfede9cb2d5bedb67f20a30c99844...> (3.10+). Tracking issue <https://bugs.python.org/issue29882> 2. NumPy tracking issue <https://github.com/numpy/numpy/issues/16325> 3. Interesting read <https://archive.org/details/dr_dobbs_journal_vol_08/page/n185/mode/2up> on how we get the magic number. Needed a bit of digging :) Please let us know what you think about the implementation and where we can improve in terms of performance or interface. Regards, Ganesh

On Thu, 2021-07-29 at 21:46 +0530, Ganesh Kathiresan wrote:
Hi All,
I am working <https://github.com/numpy/numpy/pull/19355> on a new UFunc, ` bit_count <https://en.wikipedia.org/wiki/Hamming_weight>` (popcount in other languages) that aims to count the number of 1-bits in the absolute value of an Integer.
Thanks for the proposal! Since `int.bit_count()` is now a Python builtin (as a method on integers), I feel it is a good idea to add it to NumPy. It is requested fairly commonly as well. Aside from comments about the inclusion in NumPy, I had two questions where I would love input: * Should `np.ndarray.bit_count()` exist? I tend against this; but we should have it on (integer) scalars to mirror the Python `int`. * The return value is currently the same type as the input. That means that: `np.bit_count(uint8)` returns the count as `uint8` while `np.bit_count(int32)` returns it as `int32`, etc. I think `bit_count` is different from typical math functions, so I am not quite sure this is what we want? The main alternative I see right now would be returning the default integer (usually int64) – unless otherwise specified. As an aside, I am not sure what is returned for booleans right now int8, uint8, or boolean? (Returning boolean for a count seems an oversight though). Cheers, Sebastian
Implementation ----------------------------------
. Here we take 12 operations to achieve the result which is the same as
The primary reference for the implementation is CountBitsSetParallel < http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel the lookup table method but does not suffer from memory issues or cache misses.
The implementation is aimed at unsigned integers, absolute value of signed integers and objects that support the operation.
Usage --------------
>>> np.bit_count(1023)
10
>>> a = np.array([2**i - 1 for i in range(16)])
>>> np.bit_count(a)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
>>> np.int32(1023).bit_count()
10
Notes -------------
1. Python has included this method here < https://github.com/python/cpython/commit/8bd216dfede9cb2d5bedb67f20a30c99844...
(3.10+). Tracking issue <https://bugs.python.org/issue29882>
2. NumPy tracking issue <https://github.com/numpy/numpy/issues/16325>
3. Interesting read < https://archive.org/details/dr_dobbs_journal_vol_08/page/n185/mode/2up
on how we get the magic number. Needed a bit of digging :)
Please let us know what you think about the implementation and where we can improve in terms of performance or interface.
Regards, Ganesh _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

On Mon, Aug 2, 2021, at 10:50, Sebastian Berg wrote:
* Should `np.ndarray.bit_count()` exist? I tend against this; but we should have it on (integer) scalars to mirror the Python `int`.
Should `np.bit_count` exist? Having it on the int* types may be sufficient.
* The return value is currently the same type as the input. That means that: `np.bit_count(uint8)` returns the count as `uint8` while `np.bit_count(int32)` returns it as `int32`, etc.
What is the max value of the count? 64? If so it can go in a uint8. Stéfan

On Mon, 2021-08-02 at 13:10 -0700, Stefan van der Walt wrote:
On Mon, Aug 2, 2021, at 10:50, Sebastian Berg wrote:
* Should `np.ndarray.bit_count()` exist? I tend against this; but we should have it on (integer) scalars to mirror the Python `int`.
Should `np.bit_count` exist? Having it on the int* types may be sufficient.
Right, we could add it only to the integer scalars mostly for Python compatibility. The PR suggests to create a ufunc to make the feature available to typical NumPy code (allow using it with arrays).
* The return value is currently the same type as the input. That means that: `np.bit_count(uint8)` returns the count as `uint8` while `np.bit_count(int32)` returns it as `int32`, etc.
What is the max value of the count? 64? If so it can go in a uint8.
Yes, uint8 would even work for 128 bit integers. I was a bit unsure about this, since we rarely create non-default integer arrays unless prompted, but it is a good option as well. Cheers, Sebastian
Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Should `np.ndarray.bit_count()` exist? I tend against this;
Thanks for the info Sebastian, I agree with this as we can stick to what Python offers. Should `np.bit_count` exist? Having it on the int* types may be sufficient. Hey Stephan, regarding this, I felt we could support it in the same lines NumPy Mathematical Functions <https://numpy.org/doc/stable/reference/routines.math.html>, something like GCD perhaps, where we do not have `np.ndarray.gcd` but do have an `np.gcd` What is the max value of the count? 64? If so it can go in a uint8. This makes sense yeah, will make this change, thanks for the suggestion. Also, an interesting future proposal can be to club all the bitwise functions into a single "namespace" of sorts and have np.bits.*. This has already been suggested in this comment <https://github.com/numpy/numpy/issues/16325#issuecomment-869182363> and I feel this would be a clean addition and we can support other useful functions as well. Regards, Ganesh On Tue, Aug 3, 2021 at 9:18 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2021-08-02 at 13:10 -0700, Stefan van der Walt wrote:
On Mon, Aug 2, 2021, at 10:50, Sebastian Berg wrote:
* Should `np.ndarray.bit_count()` exist? I tend against this; but we should have it on (integer) scalars to mirror the Python `int`.
Should `np.bit_count` exist? Having it on the int* types may be sufficient.
Right, we could add it only to the integer scalars mostly for Python compatibility. The PR suggests to create a ufunc to make the feature available to typical NumPy code (allow using it with arrays).
* The return value is currently the same type as the input. That means that: `np.bit_count(uint8)` returns the count as `uint8` while `np.bit_count(int32)` returns it as `int32`, etc.
What is the max value of the count? 64? If so it can go in a uint8.
Yes, uint8 would even work for 128 bit integers. I was a bit unsure about this, since we rarely create non-default integer arrays unless prompted, but it is a good option as well.
Cheers,
Sebastian
Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
participants (3)
-
Ganesh Kathiresan
-
Sebastian Berg
-
Stefan van der Walt