On Sat, Jul 30, 2022 at 5:51 PM Matteo Santamaria <matteosantama@gmail.com> wrote:

Hi all,


I’d like to open a discussion on supporting callables within `np.ndarray.__getitem__`. The intent is to make long function-chaining expressions more ergonomic by removing the need for an intermediary, temporary value.


Instead of



tmp = long_and_complicated_expression(arr)

return tmp[tmp > 0]



we would allow



return long_and_complicated_expression(arr)[lambda x: x > 0]



This feature has long been supported by pandas’ .loc accessor, where I’ve personally found it very valuable. In accordance with the pandas implementation, the callable would be required to take only a single argument.


In terms of semantics, it should always be the case that `arr[fn] == arr[fn(arr)]`.


I do realize that expanding the API and supporting additional indexing methods is not without cost, so I open the floor to anyone who’d like to weigh in for/against the proposal.

Matteo, thanks for bringing up this proposal!

In my opinion, this would not be a good idea. The main reason why this makes sense in pandas is because the pandas API is designed for "method chaining," so being able to chain indexing is also important.

In contrast, only some NumPy functions have equivalents as methods, so method chaining doesn't really work. More broadly, I don't really see how it could be made to work even if we did add methods for everything, because you almost always need to work with multiple NumPy arrays (in contrast to the multiple arrays that can live in a single pandas DataFrame).



NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-leave@python.org
Member address: shoyer@gmail.com