
On Fri, Oct 19, 2018 at 7:00 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Fri, Oct 19, 2018 at 7:50 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if they cause problems perhaps that should be seen as a sign that ndarray subclassing should be made easier and clearer.
Both maskedarray and quantity seem like something that would make more sense at the dtype level if our dtype system was easier to extend. It might be good to compile a list of subclassing applications, and split them into “this ought to be a dtype” and “this ought to be a different type of container”.
Wes Mckinney has been benchmarking masks vs sentinel values for arrow: http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/. The (bit) masks are faster. I'm not convinced dtypes are the way to go.
We need to add better support for both user-defined dtypes and for user-defined containers in any case. So we're going to support both missing value strategies regardless, and people will be able to choose based on engineering trade-offs. A missing value dtype is going to integrate much more easily into the rest of numpy than a new container where you have to reimplement indexing etc., but maybe custom containers can be faster. Okay, cool, they're both on PyPI, pick your favorite! Trying to wedge masks into *ndarray* seems like a non-starter, though, because it would require auditing and updating basically all code using the numpy C API. -n -- Nathaniel J. Smith -- https://vorpus.org