[Numpy-discussion] Proposal for adding bit_count

Mon Aug 2 13:50:14 EDT 2021

On Thu, 2021-07-29 at 21:46 +0530, Ganesh Kathiresan wrote:
> Hi All,
> 
> 
> 
> I am working <https://github.com/numpy/numpy/pull/19355> on a new
> UFunc, `
> bit_count <https://en.wikipedia.org/wiki/Hamming_weight>` (popcount
> in
> other languages) that aims to count the number of 1-bits in
> the absolute value of an Integer.
> 

Thanks for the proposal!   Since `int.bit_count()` is now a Python
builtin (as a method on integers), I feel it is a good idea to add it
to NumPy.
It is requested fairly commonly as well.

Aside from comments about the inclusion in NumPy,  I had two questions
where I would love input:

* Should `np.ndarray.bit_count()` exist?  I tend against this;
  but we should have it on (integer) scalars to mirror the
  Python `int`.

* The return value is currently the same type as the input.  That
  means that: `np.bit_count(uint8)` returns the count as `uint8`
  while `np.bit_count(int32)` returns it as `int32`, etc.

  I think `bit_count` is different from typical math functions,
  so I am not quite sure this is what we want?  The main
  alternative I see right now would be returning the default
  integer (usually int64) – unless otherwise specified.

  As an aside, I am not sure what is returned for booleans right now
  int8, uint8, or boolean?  (Returning boolean for a count seems an
  oversight though).

Cheers,

Sebastian

> 
> Implementation
> ----------------------------------
> 
> The primary reference for the implementation is CountBitsSetParallel
> <
> http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
> >.
> Here we take 12 operations to achieve the result which is the same as
> the
> lookup table method but does not suffer from memory issues or cache
> misses.
> 
> The implementation is aimed at unsigned integers, absolute value of
> signed
> integers and objects that support the operation.
> 
> 
> Usage
> --------------
> 
>     >>> np.bit_count(1023)
> 
>     10
> 
>     >>> a = np.array([2**i - 1 for i in range(16)])
> 
>     >>> np.bit_count(a)
> 
>     array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13,
> 14, 15])
> 
>     >>> np.int32(1023).bit_count()
> 
>     10
> 
> 
> Notes
> -------------
> 
> 1. Python has included this method here
> <
> https://github.com/python/cpython/commit/8bd216dfede9cb2d5bedb67f20a30c99844dbfb8
> >
>  (3.10+). Tracking issue <https://bugs.python.org/issue29882>
> 
> 2.  NumPy tracking issue
> <https://github.com/numpy/numpy/issues/16325>
> 
> 3.  Interesting read
> <
> https://archive.org/details/dr_dobbs_journal_vol_08/page/n185/mode/2up
> > on
> how we get the magic number. Needed a bit of digging :)
> 
> 
> Please let us know what you think about the implementation and where
> we can
> improve in terms of performance or interface.
> 
> Regards,
> Ganesh
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210802/97ddc81a/attachment.sig>