Return item rather than scalar for user defined types
Hello All, Yesterday I opened PR #4889 https://github.com/numpy/numpy/pull/4998 to solve a problem I have been having w.r.t. xdress and Nathaniel asked me bring the issue up here. The PR itself is quite small (6 lines?) and is easy to review. The opening text of my PR is pasted below because I believe that is a pretty good description of the issue. But briefly, pulling user defined dtypes out of an array do not behave idiomatically because you get a numpy scalar rather than a more representative Python object. For user-defined dtypes - which are typically more complex and possibly stateful than the builtin dtypes, I believe that it makes much more sense to get actual Python representation back a la the getitem() function. In fact, I think that this case also applies to the object dtype. However, changing that usage would likely break downstream code and would be inconsistent with how other builtin types are returned. In future major versions of numpy it would be ideal if the dtypes themselves could flag how they wished to be returned - either as a scalar or as the Python item. Thoughts? Be Well Anthony This updates what is effectively the __getitem__() method. For arrays such that the dtype is a user defined type, you receive the return that dtype's getitem() rather than a numpy scalar of the dtype. This allow the custom type to present a single Python API as well as an associated dtype. It also prevents users from having to subclass ndarray to get the appropriate behaviour. For example, suppose that we have a dtype representing a C++ std::vector<int> and then we had a numpy array of this dtype. From Python, it might look like
arrarray([array([0, 0, 0, 0, 0], dtype=int32), array([0, 1, 2, 3, 4], dtype=int32), array([0, 2, 4, 6, 8], dtype=int32)], dtype='xd_vector_int')
Without this PR, you'd have to do the following to access the most deeply nested elements:
arr.item(2)[4]8
This is because you cannot index a scalar:
arr[2][4]IndexError: invalid index to scalar variable
With this PR, the idiomatic expression is now allowable because arr[2] is the associated Python type:
arr[2][4]8
This is a pretty big deal for xdress http://xdress.org/ which creates many custom dtypes and provided a Python interface into those. See xdress/xdress#265 https://github.com/xdress/xdress/pull/265 for what prompted this. Thanks for considering!
participants (1)
-
Anthony Scopatz