[Numpy-discussion] asanyarray vs. asarray

Nathaniel Smith njs at pobox.com
Fri Oct 19 22:08:43 EDT 2018


On Fri, Oct 19, 2018 at 7:00 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
> On Fri, Oct 19, 2018 at 7:50 PM Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>>
>> Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if
>> they cause problems perhaps that should be seen as a sign that ndarray
>> subclassing should be made easier and clearer.
>>
>> Both maskedarray and quantity seem like something that would make more
>> sense at the dtype level if our dtype system was easier to extend. It might
>> be good to compile a list of subclassing applications, and split them into
>> “this ought to be a dtype” and “this ought to be a different type of
>> container”.
>
> Wes Mckinney has been benchmarking masks vs sentinel values for arrow:
> http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/. The (bit) masks are
> faster. I'm not convinced dtypes are the way to go.

We need to add better support for both user-defined dtypes and for
user-defined containers in any case. So we're going to support both
missing value strategies regardless, and people will be able to choose
based on engineering trade-offs. A missing value dtype is going to
integrate much more easily into the rest of numpy than a new container
where you have to reimplement indexing etc., but maybe custom
containers can be faster. Okay, cool, they're both on PyPI, pick your
favorite!

Trying to wedge masks into *ndarray* seems like a non-starter, though,
because it would require auditing and updating basically all code
using the numpy C API.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the NumPy-Discussion mailing list