Suggestion: special-case np.array(range(...)) to be faster

Compare (on Python3 -- for Python2, read "xrange" instead of "range"): In [2]: %timeit np.array(range(1000000), np.int64) 10 loops, best of 3: 156 ms per loop In [3]: %timeit np.arange(1000000, dtype=np.int64) 1000 loops, best of 3: 853 µs per loop Note that while iterating over a range is not very fast, it is still much better than the array creation: In [4]: from collections import deque In [5]: %timeit deque(range(1000000), 1) 10 loops, best of 3: 25.5 ms per loop On one hand, special cases are awful. On the other hand, the range builtin is probably important enough to deserve a special case to make this construction faster. Or not? I initially opened this as https://github.com/numpy/numpy/issues/7233 but it was suggested there that this should be discussed on the ML first. (The real issue which prompted this suggestion: I was building sparse matrices using scipy.sparse.csc_matrix with some indices specified using range, and that construction step turned out to take a significant portion of the time because of the calls to np.array). Antony

On Sat, Feb 13, 2016 at 8:57 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
Compare (on Python3 -- for Python2, read "xrange" instead of "range"):
In [2]: %timeit np.array(range(1000000), np.int64) 10 loops, best of 3: 156 ms per loop
In [3]: %timeit np.arange(1000000, dtype=np.int64) 1000 loops, best of 3: 853 µs per loop
Note that while iterating over a range is not very fast, it is still much better than the array creation:
In [4]: from collections import deque
In [5]: %timeit deque(range(1000000), 1) 10 loops, best of 3: 25.5 ms per loop
On one hand, special cases are awful. On the other hand, the range builtin is probably important enough to deserve a special case to make this construction faster. Or not? I initially opened this as https://github.com/numpy/numpy/issues/7233 but it was suggested there that this should be discussed on the ML first.
(The real issue which prompted this suggestion: I was building sparse matrices using scipy.sparse.csc_matrix with some indices specified using range, and that construction step turned out to take a significant portion of the time because of the calls to np.array).
IMO: I don't see a reason why this should be supported. There is np.arange after all for this usecase, and from_iter. range and the other guys are iterators, and in several cases we can use larange = list(range(...)) as a short cut to get python list.for python 2/3 compatibility. I think this might be partially a learning effect in the python 2 to 3 transition. After using almost only python 3 for maybe a year, I don't think it's difficult to remember the differences when writing code that is py 2.7 and py 3.x compatible. It's just **another** thing to watch out for if milliseconds matter in your application. Josef
Antony
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sat, Feb 13, 2016 at 9:43 PM, <josef.pktd@gmail.com> wrote:
On Sat, Feb 13, 2016 at 8:57 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
Compare (on Python3 -- for Python2, read "xrange" instead of "range"):
In [2]: %timeit np.array(range(1000000), np.int64) 10 loops, best of 3: 156 ms per loop
In [3]: %timeit np.arange(1000000, dtype=np.int64) 1000 loops, best of 3: 853 µs per loop
Note that while iterating over a range is not very fast, it is still much better than the array creation:
In [4]: from collections import deque
In [5]: %timeit deque(range(1000000), 1) 10 loops, best of 3: 25.5 ms per loop
On one hand, special cases are awful. On the other hand, the range builtin is probably important enough to deserve a special case to make this construction faster. Or not? I initially opened this as https://github.com/numpy/numpy/issues/7233 but it was suggested there that this should be discussed on the ML first.
(The real issue which prompted this suggestion: I was building sparse matrices using scipy.sparse.csc_matrix with some indices specified using range, and that construction step turned out to take a significant portion of the time because of the calls to np.array).
IMO: I don't see a reason why this should be supported. There is np.arange after all for this usecase, and from_iter. range and the other guys are iterators, and in several cases we can use larange = list(range(...)) as a short cut to get python list.for python 2/3 compatibility.
I think this might be partially a learning effect in the python 2 to 3 transition. After using almost only python 3 for maybe a year, I don't think it's difficult to remember the differences when writing code that is py 2.7 and py 3.x compatible.
It's just **another** thing to watch out for if milliseconds matter in your application.
side question: Is there a simple way to distinguish a iterator or generator from an iterable data structure? Josef
Josef
Antony
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

re: no reason why... This has nothing to do with Python2/Python3 (I personally stopped using Python2 at least 3 years ago.) Let me put it this way instead: if Python3's "range" (or Python2's "xrange") was not a builtin type but a type provided by numpy, I don't think it would be controversial at all to provide an `__array__` special method to efficiently convert it to a ndarray. It would be the same if `np.array` used a `functools.singledispatch` dispatcher rather than an `__array__` special method (which is obviously not possible for chronological reasons). re: iterable vs iterator: check for the presence of the __next__ special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not isinstance(x, Iterable)) Antony 2016-02-13 18:48 GMT-08:00 <josef.pktd@gmail.com>:
On Sat, Feb 13, 2016 at 9:43 PM, <josef.pktd@gmail.com> wrote:
On Sat, Feb 13, 2016 at 8:57 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
Compare (on Python3 -- for Python2, read "xrange" instead of "range"):
In [2]: %timeit np.array(range(1000000), np.int64) 10 loops, best of 3: 156 ms per loop
In [3]: %timeit np.arange(1000000, dtype=np.int64) 1000 loops, best of 3: 853 µs per loop
Note that while iterating over a range is not very fast, it is still much better than the array creation:
In [4]: from collections import deque
In [5]: %timeit deque(range(1000000), 1) 10 loops, best of 3: 25.5 ms per loop
On one hand, special cases are awful. On the other hand, the range builtin is probably important enough to deserve a special case to make this construction faster. Or not? I initially opened this as https://github.com/numpy/numpy/issues/7233 but it was suggested there that this should be discussed on the ML first.
(The real issue which prompted this suggestion: I was building sparse matrices using scipy.sparse.csc_matrix with some indices specified using range, and that construction step turned out to take a significant portion of the time because of the calls to np.array).
IMO: I don't see a reason why this should be supported. There is np.arange after all for this usecase, and from_iter. range and the other guys are iterators, and in several cases we can use larange = list(range(...)) as a short cut to get python list.for python 2/3 compatibility.
I think this might be partially a learning effect in the python 2 to 3 transition. After using almost only python 3 for maybe a year, I don't think it's difficult to remember the differences when writing code that is py 2.7 and py 3.x compatible.
It's just **another** thing to watch out for if milliseconds matter in your application.
side question: Is there a simple way to distinguish a iterator or generator from an iterable data structure?
Josef
Josef
Antony
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sun, Feb 14, 2016 at 3:21 AM, Antony Lee <antony.lee@berkeley.edu> wrote:
re: no reason why... This has nothing to do with Python2/Python3 (I personally stopped using Python2 at least 3 years ago.) Let me put it this way instead: if Python3's "range" (or Python2's "xrange") was not a builtin type but a type provided by numpy, I don't think it would be controversial at all to provide an `__array__` special method to efficiently convert it to a ndarray. It would be the same if `np.array` used a `functools.singledispatch` dispatcher rather than an `__array__` special method (which is obviously not possible for chronological reasons).
But numpy does provide arange. What's the reason to not use np.arange and use an iterator instead?
re: iterable vs iterator: check for the presence of the __next__ special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not isinstance(x, Iterable))
AFAIR and from spot checking the mailing list, in the past the argument was that it's too complicated to mix array/asarray creation with fromiter building of arrays. (I have no idea if array could cheaply delegate to fromiter.) Josef
Antony
2016-02-13 18:48 GMT-08:00 <josef.pktd@gmail.com>:
On Sat, Feb 13, 2016 at 9:43 PM, <josef.pktd@gmail.com> wrote:
On Sat, Feb 13, 2016 at 8:57 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
Compare (on Python3 -- for Python2, read "xrange" instead of "range"):
In [2]: %timeit np.array(range(1000000), np.int64) 10 loops, best of 3: 156 ms per loop
In [3]: %timeit np.arange(1000000, dtype=np.int64) 1000 loops, best of 3: 853 µs per loop
Note that while iterating over a range is not very fast, it is still much better than the array creation:
In [4]: from collections import deque
In [5]: %timeit deque(range(1000000), 1) 10 loops, best of 3: 25.5 ms per loop
On one hand, special cases are awful. On the other hand, the range builtin is probably important enough to deserve a special case to make this construction faster. Or not? I initially opened this as https://github.com/numpy/numpy/issues/7233 but it was suggested there that this should be discussed on the ML first.
(The real issue which prompted this suggestion: I was building sparse matrices using scipy.sparse.csc_matrix with some indices specified using range, and that construction step turned out to take a significant portion of the time because of the calls to np.array).
IMO: I don't see a reason why this should be supported. There is np.arange after all for this usecase, and from_iter. range and the other guys are iterators, and in several cases we can use larange = list(range(...)) as a short cut to get python list.for python 2/3 compatibility.
I think this might be partially a learning effect in the python 2 to 3 transition. After using almost only python 3 for maybe a year, I don't think it's difficult to remember the differences when writing code that is py 2.7 and py 3.x compatible.
It's just **another** thing to watch out for if milliseconds matter in your application.
side question: Is there a simple way to distinguish a iterator or generator from an iterable data structure?
Josef
Josef
Antony
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <antony.lee@berkeley.edu> wrote:
re: no reason why... This has nothing to do with Python2/Python3 (I personally stopped using Python2 at least 3 years ago.) Let me put it this way instead: if Python3's "range" (or Python2's "xrange") was not a builtin type but a type provided by numpy, I don't think it would be controversial at all to provide an `__array__` special method to efficiently convert it to a ndarray. It would be the same if `np.array` used a `functools.singledispatch` dispatcher rather than an `__array__` special method (which is obviously not possible for chronological reasons).
re: iterable vs iterator: check for the presence of the __next__ special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not isinstance(x, Iterable))
I think it's good to do something about this, but it's not clear what the exact proposal is. I could image one or both of: - special-case the range() object in array (and asarray/asanyarray?) such that array(range(N)) becomes as fast as arange(N). - special-case all iterators, such that array(range(N)) becomes as fast as deque(range(N)) or yet something else? Ralf

I was thinking (1) (special-case range()); however (2) may be more generally applicable and useful. Antony 2016-02-14 6:36 GMT-08:00 Ralf Gommers <ralf.gommers@gmail.com>:
On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <antony.lee@berkeley.edu> wrote:
re: no reason why... This has nothing to do with Python2/Python3 (I personally stopped using Python2 at least 3 years ago.) Let me put it this way instead: if Python3's "range" (or Python2's "xrange") was not a builtin type but a type provided by numpy, I don't think it would be controversial at all to provide an `__array__` special method to efficiently convert it to a ndarray. It would be the same if `np.array` used a `functools.singledispatch` dispatcher rather than an `__array__` special method (which is obviously not possible for chronological reasons).
re: iterable vs iterator: check for the presence of the __next__ special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not isinstance(x, Iterable))
I think it's good to do something about this, but it's not clear what the exact proposal is. I could image one or both of:
- special-case the range() object in array (and asarray/asanyarray?) such that array(range(N)) becomes as fast as arange(N). - special-case all iterators, such that array(range(N)) becomes as fast as deque(range(N))
or yet something else?
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sun, Feb 14, 2016 at 7:36 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <antony.lee@berkeley.edu> wrote:
re: no reason why... This has nothing to do with Python2/Python3 (I personally stopped using Python2 at least 3 years ago.) Let me put it this way instead: if Python3's "range" (or Python2's "xrange") was not a builtin type but a type provided by numpy, I don't think it would be controversial at all to provide an `__array__` special method to efficiently convert it to a ndarray. It would be the same if `np.array` used a `functools.singledispatch` dispatcher rather than an `__array__` special method (which is obviously not possible for chronological reasons).
re: iterable vs iterator: check for the presence of the __next__ special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not isinstance(x, Iterable))
I think it's good to do something about this, but it's not clear what the exact proposal is. I could image one or both of:
- special-case the range() object in array (and asarray/asanyarray?) such that array(range(N)) becomes as fast as arange(N). - special-case all iterators, such that array(range(N)) becomes as fast as deque(range(N))
I think the last wouldn't help much, as numpy would still need to determine dimensions and type. I assume that is one of the reason sparse itself doesn't do that. Chuck

On Sun, Feb 14, 2016 at 10:36 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Sun, Feb 14, 2016 at 7:36 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <antony.lee@berkeley.edu> wrote:
re: no reason why... This has nothing to do with Python2/Python3 (I personally stopped using Python2 at least 3 years ago.) Let me put it this way instead: if Python3's "range" (or Python2's "xrange") was not a builtin type but a type provided by numpy, I don't think it would be controversial at all to provide an `__array__` special method to efficiently convert it to a ndarray. It would be the same if `np.array` used a `functools.singledispatch` dispatcher rather than an `__array__` special method (which is obviously not possible for chronological reasons).
re: iterable vs iterator: check for the presence of the __next__ special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not isinstance(x, Iterable))
I think it's good to do something about this, but it's not clear what the exact proposal is. I could image one or both of:
- special-case the range() object in array (and asarray/asanyarray?) such that array(range(N)) becomes as fast as arange(N). - special-case all iterators, such that array(range(N)) becomes as fast as deque(range(N))
I think the last wouldn't help much, as numpy would still need to determine dimensions and type. I assume that is one of the reason sparse itself doesn't do that.
Not orders of magnitude, but this shows that there's something to optimize for iterators: In [1]: %timeit np.array(range(100000)) 100 loops, best of 3: 14.9 ms per loop In [2]: %timeit np.array(list(range(100000))) 100 loops, best of 3: 9.68 ms per loop Ralf

I wonder whether numpy is using the "old" iteration protocol (repeatedly calling x[i] for increasing i until StopIteration is reached?) A quick timing shows that it is indeed slower. ... actually it's not even clear to me what qualifies as a sequence for `np.array`: class C: def __iter__(self): return iter(range(10)) # [0... 9] under the new iteration protocol def __getitem__(self, i): raise IndexError # [] under the old iteration protocol np.array(C()) ===> array(<__main__.C object at 0x7f3f21ffff28>, dtype=object) So how can np.array(range(...)) even work? 2016-02-14 22:21 GMT-08:00 Ralf Gommers <ralf.gommers@gmail.com>:
On Sun, Feb 14, 2016 at 10:36 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Sun, Feb 14, 2016 at 7:36 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <antony.lee@berkeley.edu> wrote:
re: no reason why... This has nothing to do with Python2/Python3 (I personally stopped using Python2 at least 3 years ago.) Let me put it this way instead: if Python3's "range" (or Python2's "xrange") was not a builtin type but a type provided by numpy, I don't think it would be controversial at all to provide an `__array__` special method to efficiently convert it to a ndarray. It would be the same if `np.array` used a `functools.singledispatch` dispatcher rather than an `__array__` special method (which is obviously not possible for chronological reasons).
re: iterable vs iterator: check for the presence of the __next__ special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not isinstance(x, Iterable))
I think it's good to do something about this, but it's not clear what the exact proposal is. I could image one or both of:
- special-case the range() object in array (and asarray/asanyarray?) such that array(range(N)) becomes as fast as arange(N). - special-case all iterators, such that array(range(N)) becomes as fast as deque(range(N))
I think the last wouldn't help much, as numpy would still need to determine dimensions and type. I assume that is one of the reason sparse itself doesn't do that.
Not orders of magnitude, but this shows that there's something to optimize for iterators:
In [1]: %timeit np.array(range(100000)) 100 loops, best of 3: 14.9 ms per loop
In [2]: %timeit np.array(list(range(100000))) 100 loops, best of 3: 9.68 ms per loop
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On So, 2016-02-14 at 23:41 -0800, Antony Lee wrote:
I wonder whether numpy is using the "old" iteration protocol (repeatedly calling x[i] for increasing i until StopIteration is reached?) A quick timing shows that it is indeed slower. ... actually it's not even clear to me what qualifies as a sequence for `np.array`:
class C: def __iter__(self): return iter(range(10)) # [0... 9] under the new iteration protocol def __getitem__(self, i): raise IndexError # [] under the old iteration protocol
Numpy currently uses PySequence_Fast, but it has to do a two pass algorithm (find dtype+dims), and the range is converted twice to list by this call. That explains the speed advantage of converting to list manually. - Sebastian
np.array(C()) ===> array(<__main__.C object at 0x7f3f21ffff28>, dtype=object)
So how can np.array(range(...)) even work?
2016-02-14 22:21 GMT-08:00 Ralf Gommers <ralf.gommers@gmail.com>:
On Sun, Feb 14, 2016 at 10:36 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Sun, Feb 14, 2016 at 7:36 AM, Ralf Gommers < ralf.gommers@gmail.com> wrote:
On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee < antony.lee@berkeley.edu> wrote:
re: no reason why... This has nothing to do with Python2/Python3 (I personally stopped using Python2 at least 3 years ago.) Let me put it this way instead: if Python3's "range" (or Python2's "xrange") was not a builtin type but a type provided by numpy, I don't think it would be controversial at all to provide an `__array__` special method to efficiently convert it to a ndarray. It would be the same if `np.array` used a `functools.singledispatch` dispatcher rather than an `__array__` special method (which is obviously not possible for chronological reasons).
re: iterable vs iterator: check for the presence of the __next__ special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not isinstance(x, Iterable))
I think it's good to do something about this, but it's not clear what the exact proposal is. I could image one or both of:
- special-case the range() object in array (and asarray/asanyarray?) such that array(range(N)) becomes as fast as arange(N). - special-case all iterators, such that array(range(N)) becomes as fast as deque(range(N))
I think the last wouldn't help much, as numpy would still need to determine dimensions and type. I assume that is one of the reason sparse itself doesn't do that.
Not orders of magnitude, but this shows that there's something to optimize for iterators:
In [1]: %timeit np.array(range(100000)) 100 loops, best of 3: 14.9 ms per loop
In [2]: %timeit np.array(list(range(100000))) 100 loops, best of 3: 9.68 ms per loop
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
I wonder whether numpy is using the "old" iteration protocol (repeatedly calling x[i] for increasing i until StopIteration is reached?) A quick timing shows that it is indeed slower.
Yeah, I'm pretty sure that np.array doesn't know anything about "iterable", just about "sequence" (calling x[i] for 0 <= i < i.__len__()). (See Sequence vs Iterable: https://docs.python.org/3/library/collections.abc.html) Personally I'd like it if we could eventually make it so np.array specifically looks for lists and only lists, because the way it has so many different fallbacks right now creates all confusion between which objects are elements. Compare: In [5]: np.array([(1, 2), (3, 4)]).shape Out[5]: (2, 2) In [6]: np.array([(1, 2), (3, 4)], dtype="i4,i4").shape Out[6]: (2,) -n -- Nathaniel J. Smith -- https://vorpus.org

Indeed: In [1]: class C: def __getitem__(self, i): if i < 10: return i else: raise IndexError def __len__(self): return 10 ...: In [2]: np.array(C()) Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) (omitting __len__ results in the creation of an object array, consistently with the fact that the sequence protocol requires __len__). Meanwhile, I found a new way to segfault numpy :-) In [3]: class C: def __getitem__(self, i): if i < 10: return i else: raise IndexError def __len__(self): return 42 ...: In [4]: np.array(C()) Fatal Python error: Segmentation fault 2016-02-15 0:10 GMT-08:00 Nathaniel Smith <njs@pobox.com>:
On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
I wonder whether numpy is using the "old" iteration protocol (repeatedly calling x[i] for increasing i until StopIteration is reached?) A quick timing shows that it is indeed slower.
Yeah, I'm pretty sure that np.array doesn't know anything about "iterable", just about "sequence" (calling x[i] for 0 <= i < i.__len__()).
(See Sequence vs Iterable: https://docs.python.org/3/library/collections.abc.html)
Personally I'd like it if we could eventually make it so np.array specifically looks for lists and only lists, because the way it has so many different fallbacks right now creates all confusion between which objects are elements. Compare:
In [5]: np.array([(1, 2), (3, 4)]).shape Out[5]: (2, 2)
In [6]: np.array([(1, 2), (3, 4)], dtype="i4,i4").shape Out[6]: (2,)
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

just an FYI. pandas implemented a RangeIndex in upcoming 0.18.0, mainly for memory savings, see here <http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#range-index>, similar to how python range/xrange work. though there are substantial perf benefits, mainly with set operations, see here <https://github.com/pydata/pandas/blob/master/pandas/indexes/range.py#L274> though didn't officially benchmark thes. Jeff On Mon, Feb 15, 2016 at 11:13 AM, Antony Lee <antony.lee@berkeley.edu> wrote:
Indeed:
In [1]: class C: def __getitem__(self, i): if i < 10: return i else: raise IndexError def __len__(self): return 10 ...:
In [2]: np.array(C()) Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
(omitting __len__ results in the creation of an object array, consistently with the fact that the sequence protocol requires __len__). Meanwhile, I found a new way to segfault numpy :-)
In [3]: class C: def __getitem__(self, i): if i < 10: return i else: raise IndexError def __len__(self): return 42 ...:
In [4]: np.array(C()) Fatal Python error: Segmentation fault
2016-02-15 0:10 GMT-08:00 Nathaniel Smith <njs@pobox.com>:
On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
I wonder whether numpy is using the "old" iteration protocol (repeatedly calling x[i] for increasing i until StopIteration is reached?) A quick timing shows that it is indeed slower.
Yeah, I'm pretty sure that np.array doesn't know anything about "iterable", just about "sequence" (calling x[i] for 0 <= i < i.__len__()).
(See Sequence vs Iterable: https://docs.python.org/3/library/collections.abc.html)
Personally I'd like it if we could eventually make it so np.array specifically looks for lists and only lists, because the way it has so many different fallbacks right now creates all confusion between which objects are elements. Compare:
In [5]: np.array([(1, 2), (3, 4)]).shape Out[5]: (2, 2)
In [6]: np.array([(1, 2), (3, 4)], dtype="i4,i4").shape Out[6]: (2,)
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Mon, Feb 15, 2016 at 4:24 PM, Jeff Reback <jeffreback@gmail.com> wrote:
just an FYI.
pandas implemented a RangeIndex in upcoming 0.18.0, mainly for memory
savings,
see here, similar to how python range/xrange work.
though there are substantial perf benefits, mainly with set operations, see here though didn't officially benchmark thes.
Since it is a numpy-aware object (unlike the builtins), you can (and have, if I'm reading the code correctly) implement __array__() such that it does the correctly performant thing and call np.arange(). RangeIndex won't be adversely impacted by retaining the status quo. -- Robert Kern

On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
So how can np.array(range(...)) even work?
range() (in py3) is not a generator, nor is is a iterator. it is a range object, which is lazily evaluated, and satisfies both the iterator protocol and the sequence protocol (at least most of it: In [*1*]: r = range(10) In [*2*]: r[3] Out[*2*]: 3 In [*3*]: len(r) Out[*3*]: 10 In [*4*]: type(r) Out[*4*]: range In [*9*]: isinstance(r, collections.abc.Sequence) Out[*9*]: True In [*10*]: l = list() In [*11*]: isinstance(l, collections.abc.Sequence) Out[*11*]: True In [*12*]: isinstance(r, collections.abc.Iterable) Out[*12*]: True I'm still totally confused as to why we'd need to special-case range when we have arange(). -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Mostly so that there is no performance lost when someone passes range(...) instead of np.arange(...). At least I had never realized that one is much faster than the other and always just passed range() as a convenience. Antony 2016-02-17 10:50 GMT-08:00 Chris Barker <chris.barker@noaa.gov>:
On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
So how can np.array(range(...)) even work?
range() (in py3) is not a generator, nor is is a iterator. it is a range object, which is lazily evaluated, and satisfies both the iterator protocol and the sequence protocol (at least most of it:
In [*1*]: r = range(10)
In [*2*]: r[3]
Out[*2*]: 3
In [*3*]: len(r)
Out[*3*]: 10
In [*4*]: type(r)
Out[*4*]: range
In [*9*]: isinstance(r, collections.abc.Sequence)
Out[*9*]: True
In [*10*]: l = list()
In [*11*]: isinstance(l, collections.abc.Sequence)
Out[*11*]: True
In [*12*]: isinstance(r, collections.abc.Iterable)
Out[*12*]: True I'm still totally confused as to why we'd need to special-case range when we have arange().
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, Feb 18, 2016 at 1:15 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
Mostly so that there is no performance lost when someone passes range(...) instead of np.arange(...). At least I had never realized that one is much faster than the other and always just passed range() as a convenience.
Antony
2016-02-17 10:50 GMT-08:00 Chris Barker <chris.barker@noaa.gov>:
On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
So how can np.array(range(...)) even work?
range() (in py3) is not a generator, nor is is a iterator. it is a range object, which is lazily evaluated, and satisfies both the iterator protocol and the sequence protocol (at least most of it:
In [*1*]: r = range(10)
thanks, I didn't know that the range r here doesn't get eaten by iterating through it while r = (i for i in range(5)) is only good for a single pass. (tried on python 3.4) Josef
In [*2*]: r[3]
Out[*2*]: 3
In [*3*]: len(r)
Out[*3*]: 10
In [*4*]: type(r)
Out[*4*]: range
In [*9*]: isinstance(r, collections.abc.Sequence)
Out[*9*]: True
In [*10*]: l = list()
In [*11*]: isinstance(l, collections.abc.Sequence)
Out[*11*]: True
In [*12*]: isinstance(r, collections.abc.Iterable)
Out[*12*]: True I'm still totally confused as to why we'd need to special-case range when we have arange().
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, Feb 18, 2016 at 10:15 AM, Antony Lee <antony.lee@berkeley.edu> wrote:
Mostly so that there is no performance lost when someone passes range(...) instead of np.arange(...). At least I had never realized that one is much faster than the other and always just passed range() as a convenience.
Well, pretty much everything in numpy is faster if you use the numpy array version rather than plain python -- this hardly seems like the extra code would be worth it. numpy's array() constructor can (and should) take an arbitrary iterable. It does make some sense that you we might want to special case iterators, as you don't want to loop through them too many times, which is what np.fromiter() is for. and _maybe_ it would be worth special casing python lists, as you can access items faster, and they are really, really common (or has this already been done?), but special casing range() is getting silly. And it might be hard to do. At the C level I suppose you could actually know what the parameters and state of the range object are and create an array directly from that -- but that's what arange is for... -CHB
2016-02-17 10:50 GMT-08:00 Chris Barker <chris.barker@noaa.gov>:
On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
So how can np.array(range(...)) even work?
range() (in py3) is not a generator, nor is is a iterator. it is a range object, which is lazily evaluated, and satisfies both the iterator protocol and the sequence protocol (at least most of it:
In [*1*]: r = range(10)
In [*2*]: r[3]
Out[*2*]: 3
In [*3*]: len(r)
Out[*3*]: 10
In [*4*]: type(r)
Out[*4*]: range
In [*9*]: isinstance(r, collections.abc.Sequence)
Out[*9*]: True
In [*10*]: l = list()
In [*11*]: isinstance(l, collections.abc.Sequence)
Out[*11*]: True
In [*12*]: isinstance(r, collections.abc.Iterable)
Out[*12*]: True I'm still totally confused as to why we'd need to special-case range when we have arange().
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

In a sense this discussion is really about making np.array(iterable) more efficient, so I restarted the discussion at https://mail.scipy.org/pipermail/numpy-discussion/2016-February/075059.html Antony 2016-02-18 14:21 GMT-08:00 Chris Barker <chris.barker@noaa.gov>:
On Thu, Feb 18, 2016 at 10:15 AM, Antony Lee <antony.lee@berkeley.edu> wrote:
Mostly so that there is no performance lost when someone passes range(...) instead of np.arange(...). At least I had never realized that one is much faster than the other and always just passed range() as a convenience.
Well, pretty much everything in numpy is faster if you use the numpy array version rather than plain python -- this hardly seems like the extra code would be worth it.
numpy's array() constructor can (and should) take an arbitrary iterable.
It does make some sense that you we might want to special case iterators, as you don't want to loop through them too many times, which is what np.fromiter() is for.
and _maybe_ it would be worth special casing python lists, as you can access items faster, and they are really, really common (or has this already been done?), but special casing range() is getting silly. And it might be hard to do. At the C level I suppose you could actually know what the parameters and state of the range object are and create an array directly from that -- but that's what arange is for...
-CHB
2016-02-17 10:50 GMT-08:00 Chris Barker <chris.barker@noaa.gov>:
On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony.lee@berkeley.edu> wrote:
So how can np.array(range(...)) even work?
range() (in py3) is not a generator, nor is is a iterator. it is a range object, which is lazily evaluated, and satisfies both the iterator protocol and the sequence protocol (at least most of it:
In [*1*]: r = range(10)
In [*2*]: r[3]
Out[*2*]: 3
In [*3*]: len(r)
Out[*3*]: 10
In [*4*]: type(r)
Out[*4*]: range
In [*9*]: isinstance(r, collections.abc.Sequence)
Out[*9*]: True
In [*10*]: l = list()
In [*11*]: isinstance(l, collections.abc.Sequence)
Out[*11*]: True
In [*12*]: isinstance(r, collections.abc.Iterable)
Out[*12*]: True I'm still totally confused as to why we'd need to special-case range when we have arange().
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (9)
-
Antony Lee
-
Charles R Harris
-
Chris Barker
-
Jeff Reback
-
josef.pktd@gmail.com
-
Nathaniel Smith
-
Ralf Gommers
-
Robert Kern
-
Sebastian Berg