[Numpy-discussion] ndrange, like range but multidimensiontal

Mark Harfouche mark.harfouche at gmail.com
Mon Oct 8 19:25:30 EDT 2018


since ndrange is a superset of the features of ndindex, we can implement
ndindex with ndrange or keep it as is.
ndindex is now a glorified `nditer` object anyway. So it isn't so much of a
maintenance burden.
As for how ndindex is implemented, I'm a little worried about python 2
performance seeing as range is a list.
I would wait on changing the way ndindex is implemented for now.

I agree with Stephan that ndindex should be kept in. Many want backward
compatible code. It would be hard for me to justify why a dependency should
be bumped up to bleeding edge numpy just for a convenience iterator.

Honestly, I was really surprised to see such a speed difference, I thought
it would have been closer.

Allan, I decided to run a few more benchmarks, the nditer just seems slow
for single array access some reason. Maybe a bug?

```
import numpy as np
import itertools
a = np.ones((1000, 1000))

b = {}
for i in np.ndindex(a.shape):
    b[i] = i

%%timeit
# op_flag=('readonly',) doesn't change performance
for a_value in np.nditer(a):
    pass
109 ms ± 921 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    a_value = a[i]
113 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    c = b[i]
193 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
for a_value in a.flat:
    pass
25.3 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for k, v in b.items():
    pass
19.9 ms ± 675 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    pass
28 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e.,
> discouraging its use in our docs, but not actually deprecating it).
> Certainly ndrange seems like a small but meaningful improvement in the
> interface.
>
> That said, I'm not convinced this is really worth the trouble. I think the
> nested loop is still pretty readable/clear, and there are few times when
> I've actually found ndindex() be useful.
>
> On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane <allanhaldane at gmail.com>
> wrote:
>
>> On 10/8/18 12:21 PM, Mark Harfouche wrote:
>> > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like
>> > `range`, is not an iterator. Changing this behaviour would likely lead
>> > to breaking code that uses that assumption. For example anybody using
>> > introspection or code like:
>> >
>> > ```
>> > indx = np.ndindex(5, 5)
>> > next(indx)  # Don't look at the (0, 0) coordinate
>> > for i in indx:
>> >     print(i)
>> > ```
>> > would break if `ndindex` becomes "not an iterator"
>>
>> OK, I see now. Just like python3 has separate range and range_iterator
>> types, where range is sliceable, we would have separate ndrange and
>> ndindex types, where ndrange is sliceable. You're just copying the
>> python3 api. That justifies it pretty well for me.
>>
>> I still think we shouldn't have two functions which do nearly the same
>> thing. We should only have one, and get rid of the other. I see two ways
>> forward:
>>
>>  * replace ndindex by your ndrange code, so it is no longer an iter.
>>    This would require some deprecation cycles for the cases that break.
>>  * deprecate ndindex in favor of a new function ndrange. We would keep
>>    ndindex around for back-compatibility, with a dep warning to use
>>    ndrange instead.
>>
>> Doing a code search on github, I can see that a lot of people's code
>> would break if ndindex no longer was an iter. I also like the name
>> ndrange for its allusion to python3's range behavior. That makes me lean
>> towards the second option of a separate ndrange, with possible
>> deprecation of ndindex.
>>
>> > itertools.product + range seems to be much faster than the current
>> > implementation of ndindex
>> >
>> > (python 3.6)
>> > ```
>> > %%timeit
>> >
>> > for i in np.ndindex(100, 100):
>> >     pass
>> > 3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>> >
>> > %%timeit
>> > import itertools
>> > for i in itertools.product(range(100), range(100)):
>> >     pass
>> > 231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>> > ```
>>
>> If the new code ends up faster than the old code, that's great, and
>> further justification for using ndrange instead of ndindex. I had
>> thought using nditer in the old code was fastest.
>>
>> So as far as I am concerned, I say go ahead with the PR the way you are
>> doing it.
>>
>> Allan
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181008/f50e08e3/attachment-0001.html>


More information about the NumPy-Discussion mailing list