[Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

Antony Lee antony.lee at berkeley.edu
Mon Feb 15 02:41:29 EST 2016


I wonder whether numpy is using the "old" iteration protocol (repeatedly
calling x[i] for increasing i until StopIteration is reached?)  A quick
timing shows that it is indeed slower.
... actually it's not even clear to me what qualifies as a sequence for
`np.array`:

class C:
    def __iter__(self):
        return iter(range(10)) # [0... 9] under the new iteration protocol
    def __getitem__(self, i):
        raise IndexError # [] under the old iteration protocol

np.array(C())
===> array(<__main__.C object at 0x7f3f21ffff28>, dtype=object)


So how can np.array(range(...)) even work?

2016-02-14 22:21 GMT-08:00 Ralf Gommers <ralf.gommers at gmail.com>:

>
>
> On Sun, Feb 14, 2016 at 10:36 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Sun, Feb 14, 2016 at 7:36 AM, Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <antony.lee at berkeley.edu>
>>> wrote:
>>>
>>>> re: no reason why...
>>>> This has nothing to do with Python2/Python3 (I personally stopped using
>>>> Python2 at least 3 years ago.)  Let me put it this way instead: if
>>>> Python3's "range" (or Python2's "xrange") was not a builtin type but a type
>>>> provided by numpy, I don't think it would be controversial at all to
>>>> provide an `__array__` special method to efficiently convert it to a
>>>> ndarray.  It would be the same if `np.array` used a
>>>> `functools.singledispatch` dispatcher rather than an `__array__` special
>>>> method (which is obviously not possible for chronological reasons).
>>>>
>>>> re: iterable vs iterator: check for the presence of the __next__
>>>> special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and
>>>> not isinstance(x, Iterable))
>>>>
>>>
>>> I think it's good to do something about this, but it's not clear what
>>> the exact proposal is. I could image one or both of:
>>>
>>>   - special-case the range() object in array (and asarray/asanyarray?)
>>> such that array(range(N)) becomes as fast as arange(N).
>>>   - special-case all iterators, such that array(range(N)) becomes as
>>> fast as deque(range(N))
>>>
>>
>> I think the last wouldn't help much, as numpy would still need to
>> determine dimensions and type.  I assume that is one of the reason sparse
>> itself doesn't do that.
>>
>
> Not orders of magnitude, but this shows that there's something to optimize
> for iterators:
>
> In [1]: %timeit np.array(range(100000))
> 100 loops, best of 3: 14.9 ms per loop
>
> In [2]: %timeit np.array(list(range(100000)))
> 100 loops, best of 3: 9.68 ms per loop
>
> Ralf
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160214/8cbec9a5/attachment.html>


More information about the NumPy-Discussion mailing list