[Numpy-discussion] Boolean arrays with nulls?

Stefan van der Walt stefanv at berkeley.edu
Thu Apr 18 12:45:41 EDT 2019


Hi Stuart,

On Thu, 18 Apr 2019 09:12:31 -0700, Stuart Reynolds wrote:
> Is there an efficient way to represent bool arrays with null entries?

You can use the bool dtype:

In [5]: x = np.array([True, False, True])                                                                                                                                            

In [6]: x                                                                                                                                                                            
Out[6]: array([ True, False,  True])

In [7]: x.dtype                                                                                                                                                                      
Out[7]: dtype('bool')

You should note that this stores one True/False value per byte, so it is
not optimal in terms of memory use.  There is no easy way to do
bit-arrays with NumPy, because we use strides to determine how to move
from one memory location to the next.

See also: https://www.reddit.com/r/Python/comments/5oatp5/one_bit_data_type_in_numpy/

> What I’m hoping for is that there’s a structure that is ‘viewed’ as
> nan-able float data, but backed but a more efficient structures
> internally.

There are good implementations of this idea, such as:

https://github.com/ilanschnell/bitarray

Those structures cannot typically utilize the NumPy machinery, though.
With the new array function interface, you should at least be able to
build something that has something close to the NumPy API.

Best regards,
Stéfan


More information about the NumPy-Discussion mailing list