Unpacking 0-D object arrays when setting item/filling
Hi all, TL;DR: Are there strong opinions on what: arr = np.array([None]) arr[0] = np.array(3) arr.fill(np.array(3)) Should give? Right now NumPy usually stores the 0-D array, I want to make `arr.fill()` consistent, so that is what it would be doing also. **Long discussion:** NumPy has a bit of a consistency problem with item setting, that is, what should happen if you have an array and set an individual element: arr = np.arange(10, dtype=object) # object is most problematic arr[0] = value Or similarly (but currently somewhat different): arr.fill(value) I wish to make the behaviour more consistent here, that is, both of these should use the same logic (fill is slightly different currently) and that logic should be shared by all functionality that is based on "setting a single element". There is a tricky situation however if `value` is a 0-D array. In most cases, we can just copy the value over (using correct casting) as if we copied an array: arr[0] = zero_d_arr is the same as: arr[0, ...] = zero_d_arr # definitely copies the array value But this is currently *not* consistently the case. I wish to make this consistent. The confusion is around object arrays, though: value = np.array(None, dtype=object) arr[0] = value Stores `value` without unpacking it currently. arr.fill(value) Stores the `None` (unpacking `value`) if and only if `value` is 0-D. Right now, i aligned `fill` with the item assignment, but this could be argued. Further related behaviour is that: np.array(value) # unpacks any array np.array([value, None], dtype=object) # does not unpack Which allows you to pack arrays into object arrays. Now, we could unpack making it impossible to place a 0-D array into an object array except via `arr.itemset()`. My current preference is to store the 0-D arrays as-is, and basically say that passing a 0-D array as value for: arr.fill(0d_arr) arr[0] = 0d_arr # this is fine: arr[0, ...] = 0d_arr it makes some issues around "array-like" objects a bit easier, and is just what we do _most_ (but not all!) of the time. But I know that many will look at it and immediately say: No this has to be unpacked! So I am open to suggestions or changing this in general. (However, I think aligning `arr.fill` is OK, changing `arr[0] = ...`, I don't know.) Cheers, Sebastian
On Mon, Feb 14, 2022, at 12:45, Sebastian Berg wrote:
But this is currently *not* consistently the case. I wish to make this consistent. The confusion is around object arrays, though:
value = np.array(None, dtype=object) arr[0] = value
Stores `value` without unpacking it currently.
arr.fill(value)
Stores the `None` (unpacking `value`) if and only if `value` is 0-D.
That last behavior doesn't look right to me. An object array should be thought of as a collection of pointers, and if you happen to want to point to a NumPy array, so be it.
Further related behaviour is that:
np.array(value) # unpacks any array np.array([value, None], dtype=object) # does not unpack
This seems reasonable. What would another reasonable expectation be?
Now, we could unpack making it impossible to place a 0-D array into an object array except via `arr.itemset()`.
Not sure why we'd do that.
My current preference is to store the 0-D arrays as-is, and basically say that passing a 0-D array as value for:
arr.fill(0d_arr) arr[0] = 0d_arr # this is fine: arr[0, ...] = 0d_arr
Does this differ from the current behavior? It looks to me like object arrays get correctly filled and assigned. Stéfan
On Mon, Feb 14, 2022, at 12:45, Sebastian Berg wrote:
But this is currently *not* consistently the case. I wish to make this consistent. The confusion is around object arrays, though:
value = np.array(None, dtype=object) arr[0] = value
Stores `value` without unpacking it currently.
arr.fill(value)
Stores the `None` (unpacking `value`) if and only if `value` is 0-D. That last behavior doesn't look right to me. An object array should be thought of as a collection of pointers, and if you happen to want to point to a NumPy array, so be it. I think we should strive for consistency and code simplicity. In the non-object case, it is clear that assignment will try to unpack an ndarray. So we should do the same thing with object arrays, and document
On 15/2/22 09:53, Stefan van der Walt wrote: the change in behaviour. Could we suggest a backward compatible alternative (would using a record dtype fit better with Stefan's mental model)? Matti
On Tue, 2022-02-15 at 10:21 +0200, Matti Picus wrote:
On Mon, Feb 14, 2022, at 12:45, Sebastian Berg wrote:
But this is currently *not* consistently the case. I wish to make this consistent. The confusion is around object arrays, though:
value = np.array(None, dtype=object) arr[0] = value
Stores `value` without unpacking it currently.
arr.fill(value)
Stores the `None` (unpacking `value`) if and only if `value` is 0-D. That last behavior doesn't look right to me. An object array should be thought of as a collection of pointers, and if you happen to want to point to a NumPy array, so be it. I think we should strive for consistency and code simplicity. In the non-object case, it is clear that assignment will try to unpack an ndarray. So we should do the same thing with object arrays, and document
On 15/2/22 09:53, Stefan van der Walt wrote: the change in behaviour. Could we suggest a backward compatible alternative (would using a record dtype fit better with Stefan's mental model)?
A possible workaround might be `arr.itemset()`, although I am not quite convinced that it should do that. A structured array could behave differently maybe, but presumably: arr = np.array([1], "i,i") # structured arr[0] = (np.array(3), np.array(3)) Should also unpack (with correct casting, not via `__int__`). I guess we may have less special cases if we just unpack 0-D and _try_ to always pack N-D arrays with N>0 (which will often fail). I have to admit, I don't care too much about that special case. The point is that if done right these special cases should be confined to two places: 1. Discovering the dimension/dtype in `np.asarray(nested_objects)` 2. The `PyArray_Pack` that sets a single element of an array from an arbitrary Python object. And in the first part, we need special paths for "object" anyway. I can live with a "special case", if it is confined to `PyArray_Pack`. There are always weird things, e.g. Quantities does *never* want to be unpacked right now even though it is a subclass so we already have "tricky stuff" that really means we should channel all through "one right way", no matter how that way actually looks like. Cheers, Sebastian
Matti
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
participants (3)
-
Matti Picus
-
Sebastian Berg
-
Stefan van der Walt