[Numpy-discussion] Boolean arrays with nulls?

Stuart Reynolds stuart at stuartreynolds.net
Thu Apr 18 16:36:26 EDT 2019


Looks like a good fit. Thanks.

On Thu, Apr 18, 2019 at 11:17 AM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> One option here would be to use masked arrays:
>
> arr = np.ma.zeros(3, dtype=bool)
> arr[0] = True
> arr[1] = False
> arr[2] = np.ma.masked
>
> giving
>
> masked_array(data=[True, False, --],
>              mask=[False, False,  True],
>        fill_value=True)
>
> On Thu, 18 Apr 2019 at 10:51, Stuart Reynolds <stuart at stuartreynolds.net>
> wrote:
> >
> > Thanks. I’m aware of bool arrays.
> > I think the tricky part of what I’m looking for is NULLability and
> interoperability with code the deals with billable data (float arrays).
> >
> > Currently the options seem to be float arrays, or custom operations that
> carry (unabstracted) categorical array data representations, such as:
> > 0: false
> > 1: true
> > 2: NULL
> >
> > ... which wouldn’t be compatible with algorithms that use, say, np.isnan.
> > Ideally, it would be nice to have a structure that was float-like in
> that it’s compatible with nan-aware operations, but it’s storage is just a
> single byte per cell (or less).
> >
> > Is float8 a thing?
> >
> >
> > On Thu, Apr 18, 2019 at 9:46 AM Stefan van der Walt <
> stefanv at berkeley.edu> wrote:
> >>
> >> Hi Stuart,
> >>
> >> On Thu, 18 Apr 2019 09:12:31 -0700, Stuart Reynolds wrote:
> >> > Is there an efficient way to represent bool arrays with null entries?
> >>
> >> You can use the bool dtype:
> >>
> >> In [5]: x = np.array([True, False, True])
> >>
> >> In [6]: x
> >> Out[6]: array([ True, False,  True])
> >>
> >> In [7]: x.dtype
> >> Out[7]: dtype('bool')
> >>
> >> You should note that this stores one True/False value per byte, so it is
> >> not optimal in terms of memory use.  There is no easy way to do
> >> bit-arrays with NumPy, because we use strides to determine how to move
> >> from one memory location to the next.
> >>
> >> See also:
> https://www.reddit.com/r/Python/comments/5oatp5/one_bit_data_type_in_numpy/
> >>
> >> > What I’m hoping for is that there’s a structure that is ‘viewed’ as
> >> > nan-able float data, but backed but a more efficient structures
> >> > internally.
> >>
> >> There are good implementations of this idea, such as:
> >>
> >> https://github.com/ilanschnell/bitarray
> >>
> >> Those structures cannot typically utilize the NumPy machinery, though.
> >> With the new array function interface, you should at least be able to
> >> build something that has something close to the NumPy API.
> >>
> >> Best regards,
> >> Stéfan
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at python.org
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190418/bc621cc1/attachment-0001.html>


More information about the NumPy-Discussion mailing list