[Numpy-discussion] ndrange, like range but multidimensiontal

Mark Harfouche mark.harfouche at gmail.com
Wed Oct 10 09:56:10 EDT 2018


Eric,

Great point. The multi-dimensional slicing and sequence return type is
definitely strange. I was thinking about that last night.
I’m a little new to the __array__ methods.
Are you saying that the sequence behaviour would stay the same, (ie.
__iter__, __revesed__, __contains__), but
np.asarray(np.ndrange((3, 3)))
would return something like an array of tuples?
I’m not sure this is something that anybody can’t already with do meshgrid
+ stack

and only implement methods already present in numpy.

I’m not sure what this means.

I’ll note that in Python 3
<https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range>,
range is it’s own thing. It is still a sequence type but it doesn’t support
addition.
I’m kinda ok with ndrange/ndindex being a sequence type, supporting ND
slicing, but not being an array ;)

I’m kinda warming up to the idea of expanding ndindex.

   1. The additional start and step can be omitted from ndindex for a while
   (indefinitely?). Slicing is way more convenient anyway.
   2. Warnings can help people move from nd.index(1, 2, 3) to nd.index((1,
   2, 3))
   3. ndindex can return a seperate iterator, but the ndindex object would
   hold a reference to it. Calls to ndindex.__next__ would simply return
   next(of_that_object)
   Note. This would break introspection since the iterator is no longer
   ndindex type. I’m kinda OK with this though, but breaking code is never
   nice :(
   4. Bench-marking can help motivate the choice of iterator used for step=(1,)
   * N start=(0,) * N
   5. Wait until 2019 because I don’t want to deal with performance
   regressions of potentially using range in Python2 and I don’t want this
   to motivate any implementation details.

Mark

On Wed, Oct 10, 2018 at 12:36 AM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> One thing that worries me here - in python, range(...) in essence
> generates a lazy list - so I’d expect ndrange to generate a lazy ndarray.
> In practice, that means it would be a duck-type defining an __array__
> method to evaluate it, and only implement methods already present in numpy.
>
> It’s not clear to me what the datatype of such an array-like would be.
> Candidates I can think of are:
>
>    1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a
>    little awkward
>    2. (intp, (N,)) - which collapses into a shape + (3,) array
>    3. object_.
>    4. Some new np.tuple_ dtype, a heterogenous tuple, which is like the
>    structured np.void but without field names. I’m not sure how
>    vectorized element indexing would be spelt though.
>
> Eric
>>
> On Tue, 9 Oct 2018 at 21:59 Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> The speed difference is interesting but really a different question than
>> the public API.
>>
>> I'm coming around to ndrange(). I can see how it could be useful for
>> symbolic manipulation of arrays and indexing operations, similar to what we
>> do in dask and xarray.
>>
>> On Mon, Oct 8, 2018 at 4:25 PM Mark Harfouche <mark.harfouche at gmail.com>
>> wrote:
>>
>>> since ndrange is a superset of the features of ndindex, we can implement
>>> ndindex with ndrange or keep it as is.
>>> ndindex is now a glorified `nditer` object anyway. So it isn't so much
>>> of a maintenance burden.
>>> As for how ndindex is implemented, I'm a little worried about python 2
>>> performance seeing as range is a list.
>>> I would wait on changing the way ndindex is implemented for now.
>>>
>>> I agree with Stephan that ndindex should be kept in. Many want backward
>>> compatible code. It would be hard for me to justify why a dependency should
>>> be bumped up to bleeding edge numpy just for a convenience iterator.
>>>
>>> Honestly, I was really surprised to see such a speed difference, I
>>> thought it would have been closer.
>>>
>>> Allan, I decided to run a few more benchmarks, the nditer just seems
>>> slow for single array access some reason. Maybe a bug?
>>>
>>> ```
>>> import numpy as np
>>> import itertools
>>> a = np.ones((1000, 1000))
>>>
>>> b = {}
>>> for i in np.ndindex(a.shape):
>>>     b[i] = i
>>>
>>> %%timeit
>>> # op_flag=('readonly',) doesn't change performance
>>> for a_value in np.nditer(a):
>>>     pass
>>> 109 ms ± 921 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>
>>> %%timeit
>>> for i in itertools.product(range(1000), range(1000)):
>>>     a_value = a[i]
>>> 113 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>
>>> %%timeit
>>> for i in itertools.product(range(1000), range(1000)):
>>>     c = b[i]
>>> 193 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>>
>>> %%timeit
>>> for a_value in a.flat:
>>>     pass
>>> 25.3 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>
>>> %%timeit
>>> for k, v in b.items():
>>>     pass
>>> 19.9 ms ± 675 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>
>>> %%timeit
>>> for i in itertools.product(range(1000), range(1000)):
>>>     pass
>>> 28 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> ```
>>>
>>> On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer <shoyer at gmail.com> wrote:
>>>
>>>> I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e.,
>>>> discouraging its use in our docs, but not actually deprecating it).
>>>> Certainly ndrange seems like a small but meaningful improvement in the
>>>> interface.
>>>>
>>>> That said, I'm not convinced this is really worth the trouble. I think
>>>> the nested loop is still pretty readable/clear, and there are few times
>>>> when I've actually found ndindex() be useful.
>>>>
>>>> On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane <allanhaldane at gmail.com>
>>>> wrote:
>>>>
>>>>> On 10/8/18 12:21 PM, Mark Harfouche wrote:
>>>>> > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like
>>>>> > `range`, is not an iterator. Changing this behaviour would likely
>>>>> lead
>>>>> > to breaking code that uses that assumption. For example anybody using
>>>>> > introspection or code like:
>>>>> >
>>>>> > ```
>>>>> > indx = np.ndindex(5, 5)
>>>>> > next(indx)  # Don't look at the (0, 0) coordinate
>>>>> > for i in indx:
>>>>> >     print(i)
>>>>> > ```
>>>>> > would break if `ndindex` becomes "not an iterator"
>>>>>
>>>>> OK, I see now. Just like python3 has separate range and range_iterator
>>>>> types, where range is sliceable, we would have separate ndrange and
>>>>> ndindex types, where ndrange is sliceable. You're just copying the
>>>>> python3 api. That justifies it pretty well for me.
>>>>>
>>>>> I still think we shouldn't have two functions which do nearly the same
>>>>> thing. We should only have one, and get rid of the other. I see two
>>>>> ways
>>>>> forward:
>>>>>
>>>>>  * replace ndindex by your ndrange code, so it is no longer an iter.
>>>>>    This would require some deprecation cycles for the cases that break.
>>>>>  * deprecate ndindex in favor of a new function ndrange. We would keep
>>>>>    ndindex around for back-compatibility, with a dep warning to use
>>>>>    ndrange instead.
>>>>>
>>>>> Doing a code search on github, I can see that a lot of people's code
>>>>> would break if ndindex no longer was an iter. I also like the name
>>>>> ndrange for its allusion to python3's range behavior. That makes me
>>>>> lean
>>>>> towards the second option of a separate ndrange, with possible
>>>>> deprecation of ndindex.
>>>>>
>>>>> > itertools.product + range seems to be much faster than the current
>>>>> > implementation of ndindex
>>>>> >
>>>>> > (python 3.6)
>>>>> > ```
>>>>> > %%timeit
>>>>> >
>>>>> > for i in np.ndindex(100, 100):
>>>>> >     pass
>>>>> > 3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops
>>>>> each)
>>>>> >
>>>>> > %%timeit
>>>>> > import itertools
>>>>> > for i in itertools.product(range(100), range(100)):
>>>>> >     pass
>>>>> > 231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops
>>>>> each)
>>>>> > ```
>>>>>
>>>>> If the new code ends up faster than the old code, that's great, and
>>>>> further justification for using ndrange instead of ndindex. I had
>>>>> thought using nditer in the old code was fastest.
>>>>>
>>>>> So as far as I am concerned, I say go ahead with the PR the way you are
>>>>> doing it.
>>>>>
>>>>> Allan
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at python.org
>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181010/fce7ffd3/attachment-0001.html>


More information about the NumPy-Discussion mailing list