Mailman 3 surprising behavior from array indexing - NumPy-Discussion

Dec. 30, 2024

      Happy new year everybody!

I've been upgrading my code to start to support array indexing and in my
tests I found something that was well documented, but surprising to me.

I've tried to read through
https://numpy.org/doc/stable/user/basics.indexing.html#combining-advanced-an...
and even after multiple passes, I still find it very terse...

Consider a mutli dimensional dataset:

import numpy as np
shape = (10, 20, 30)
original = np.arange(np.prod(shape)).reshape(shape)

Let's consider we want to collapse dim 0 to a single entry
Let's consider we want a subset from dim 1, with a slice
Let's consider that we want want 3 elements from dim 2

i = 2
j = slice(1, 6)
k = slice(7, 10)
out_basic = original[i, j, k]
assert out_basic.shape == (5, 3)

Now consider we want to provide freedom to have instead of a slice for k,
an arbitrary "array"

k = [7, 11, 13]
out_array = original[i, j, k]
assert out_array.shape == (5, 3), f"shape is actually {out_array.shape}"

AssertionError: shape is actually (3, 5)

To get the result "Mark expects", one has to do it in two steps

integer_types = (int, np.integer)
integer_indexes = (
    i if isinstance(i, integer_types) else slice(None),
    j if isinstance(j, integer_types) else slice(None),
    k if isinstance(k, integer_types) else slice(None),
)
non_integer_indexes = (
    ((i,) if not isinstance(i, integer_types) else ()) +
    ((j,) if not isinstance(j, integer_types) else ()) +
    ((k,) if not isinstance(k, integer_types) else ())
)
out_double_indexed = original [integer_indexes][non_integer_indexes]
assert out_double_indexed.shape == (5, 3), f"shape is actually
{out_double_indexed.shape}"

This is somewhat very surprising to me. I totally understand that things
won't change in terms of this kind of indexing in numpy, but is there a way
I can adjust my indexing strategy to regain the ability to slice into my
array in a "single shot".

The main usecase is for arrays that are truly huge, but chucked in ways
where slicing into them can be quite efficient. This multi-dimensional
imaging data. Each chunk is quite "huge" so this kind of metadata
manipulation is worthwhile to avoid unecessary IO.

Perhaps there is a "simple" distinction I am missing, for example using a
tuple for k instead of a list????

Thanks for your input!

Mark

(I tried to keep my code copy pastable)

surprising behavior from array indexing

Mark Harfouche

Robert Kern

Mark Harfouche

Robert Kern

Mark Harfouche

tags

participants (2)