[Numpy-discussion] ndrange, like range but multidimensiontal
Eric Wieser
wieser.eric+numpy at gmail.com
Wed Oct 10 00:34:29 EDT 2018
One thing that worries me here - in python, range(...) in essence generates
a lazy list - so I’d expect ndrange to generate a lazy ndarray. In
practice, that means it would be a duck-type defining an __array__ method
to evaluate it, and only implement methods already present in numpy.
It’s not clear to me what the datatype of such an array-like would be.
Candidates I can think of are:
1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a
little awkward
2. (intp, (N,)) - which collapses into a shape + (3,) array
3. object_.
4. Some new np.tuple_ dtype, a heterogenous tuple, which is like the
structured np.void but without field names. I’m not sure how vectorized
element indexing would be spelt though.
Eric
On Tue, 9 Oct 2018 at 21:59 Stephan Hoyer <shoyer at gmail.com> wrote:
> The speed difference is interesting but really a different question than
> the public API.
>
> I'm coming around to ndrange(). I can see how it could be useful for
> symbolic manipulation of arrays and indexing operations, similar to what we
> do in dask and xarray.
>
> On Mon, Oct 8, 2018 at 4:25 PM Mark Harfouche <mark.harfouche at gmail.com>
> wrote:
>
>> since ndrange is a superset of the features of ndindex, we can implement
>> ndindex with ndrange or keep it as is.
>> ndindex is now a glorified `nditer` object anyway. So it isn't so much of
>> a maintenance burden.
>> As for how ndindex is implemented, I'm a little worried about python 2
>> performance seeing as range is a list.
>> I would wait on changing the way ndindex is implemented for now.
>>
>> I agree with Stephan that ndindex should be kept in. Many want backward
>> compatible code. It would be hard for me to justify why a dependency should
>> be bumped up to bleeding edge numpy just for a convenience iterator.
>>
>> Honestly, I was really surprised to see such a speed difference, I
>> thought it would have been closer.
>>
>> Allan, I decided to run a few more benchmarks, the nditer just seems slow
>> for single array access some reason. Maybe a bug?
>>
>> ```
>> import numpy as np
>> import itertools
>> a = np.ones((1000, 1000))
>>
>> b = {}
>> for i in np.ndindex(a.shape):
>> b[i] = i
>>
>> %%timeit
>> # op_flag=('readonly',) doesn't change performance
>> for a_value in np.nditer(a):
>> pass
>> 109 ms ± 921 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>
>> %%timeit
>> for i in itertools.product(range(1000), range(1000)):
>> a_value = a[i]
>> 113 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>
>> %%timeit
>> for i in itertools.product(range(1000), range(1000)):
>> c = b[i]
>> 193 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>
>> %%timeit
>> for a_value in a.flat:
>> pass
>> 25.3 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>
>> %%timeit
>> for k, v in b.items():
>> pass
>> 19.9 ms ± 675 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>
>> %%timeit
>> for i in itertools.product(range(1000), range(1000)):
>> pass
>> 28 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>> ```
>>
>> On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer <shoyer at gmail.com> wrote:
>>
>>> I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e.,
>>> discouraging its use in our docs, but not actually deprecating it).
>>> Certainly ndrange seems like a small but meaningful improvement in the
>>> interface.
>>>
>>> That said, I'm not convinced this is really worth the trouble. I think
>>> the nested loop is still pretty readable/clear, and there are few times
>>> when I've actually found ndindex() be useful.
>>>
>>> On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane <allanhaldane at gmail.com>
>>> wrote:
>>>
>>>> On 10/8/18 12:21 PM, Mark Harfouche wrote:
>>>> > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like
>>>> > `range`, is not an iterator. Changing this behaviour would likely lead
>>>> > to breaking code that uses that assumption. For example anybody using
>>>> > introspection or code like:
>>>> >
>>>> > ```
>>>> > indx = np.ndindex(5, 5)
>>>> > next(indx) # Don't look at the (0, 0) coordinate
>>>> > for i in indx:
>>>> > print(i)
>>>> > ```
>>>> > would break if `ndindex` becomes "not an iterator"
>>>>
>>>> OK, I see now. Just like python3 has separate range and range_iterator
>>>> types, where range is sliceable, we would have separate ndrange and
>>>> ndindex types, where ndrange is sliceable. You're just copying the
>>>> python3 api. That justifies it pretty well for me.
>>>>
>>>> I still think we shouldn't have two functions which do nearly the same
>>>> thing. We should only have one, and get rid of the other. I see two ways
>>>> forward:
>>>>
>>>> * replace ndindex by your ndrange code, so it is no longer an iter.
>>>> This would require some deprecation cycles for the cases that break.
>>>> * deprecate ndindex in favor of a new function ndrange. We would keep
>>>> ndindex around for back-compatibility, with a dep warning to use
>>>> ndrange instead.
>>>>
>>>> Doing a code search on github, I can see that a lot of people's code
>>>> would break if ndindex no longer was an iter. I also like the name
>>>> ndrange for its allusion to python3's range behavior. That makes me lean
>>>> towards the second option of a separate ndrange, with possible
>>>> deprecation of ndindex.
>>>>
>>>> > itertools.product + range seems to be much faster than the current
>>>> > implementation of ndindex
>>>> >
>>>> > (python 3.6)
>>>> > ```
>>>> > %%timeit
>>>> >
>>>> > for i in np.ndindex(100, 100):
>>>> > pass
>>>> > 3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops
>>>> each)
>>>> >
>>>> > %%timeit
>>>> > import itertools
>>>> > for i in itertools.product(range(100), range(100)):
>>>> > pass
>>>> > 231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops
>>>> each)
>>>> > ```
>>>>
>>>> If the new code ends up faster than the old code, that's great, and
>>>> further justification for using ndrange instead of ndindex. I had
>>>> thought using nditer in the old code was fastest.
>>>>
>>>> So as far as I am concerned, I say go ahead with the PR the way you are
>>>> doing it.
>>>>
>>>> Allan
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181010/89227354/attachment-0001.html>
More information about the NumPy-Discussion
mailing list