[Numpy-discussion] FeatureRequest: support for array construction from iterators

Benjamin Root ben.v.root at gmail.com
Mon Dec 14 10:56:02 EST 2015


Devil's advocate here: np.array() has become the de-facto "constructor" for
numpy arrays. Right now, passing it a generator results in what, IMHO, is a
useless result:

>>> np.array((i for i in range(10)))
array(<generator object <genexpr> at 0x7f28b2beca00>, dtype=object)

Passing pretty much any dtype argument will cause that to fail:

>>> np.array((i for i in range(10)), dtype=np.int_)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: long() argument must be a string or a number, not 'generator'

Therefore, I think it is not out of the realm of reason that passing a
generator object and a dtype could then delegate the work under the hood to
np.fromiter()? I would even go so far as to raise an error if one passes a
generator without specifying dtype to np.array(). The point is to reduce
the number of entry points for creating numpy arrays.


By the way, any reason why this works?
>>> np.array(xrange(10))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


Cheers!
Ben Root


On Sat, Dec 12, 2015 at 6:02 PM, Juan Nunez-Iglesias <jni.soma at gmail.com>
wrote:

> Hey Nathaniel,
>
> Fascinating! Thanks for the primer! I didn't know that it would check
> dtype of values in the whole array. In that case, I would agree that it
> would be bad to infer it magically from just the first value, and this can
> be left to the users.
>
> Thanks!
>
> Juan.
>
> On Sat, Dec 12, 2015 at 7:00 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Fri, Dec 11, 2015 at 11:32 PM, Juan Nunez-Iglesias
>> <jni.soma at gmail.com> wrote:
>> > Nathaniel,
>> >
>> >> IMO this is better than making np.array(iter) internally call
>> list(iter)
>> >> or equivalent
>> >
>> > Yeah but that's not the only option:
>> >
>> > from itertools import chain
>> > def fromiter_awesome_edition(iterable):
>> >     elem = next(iterable)
>> >     dtype = whatever_numpy_does_to_infer_dtypes_from_lists(elem)
>> >     return np.fromiter(chain([elem], iterable), dtype=dtype)
>> >
>> > I think this would be a huge win for usability. Always getting tripped
>> up by
>> > the dtype requirement. I can submit a PR if people like this pattern.
>>
>> This isn't the semantics of np.array, though -- np.array will look at
>> the whole input and try to find a common dtype, so this can't be the
>> implementation for np.array(iter). E.g. try np.array([1, 1.0])
>>
>> I can see an argument for making the dtype= argument to fromiter
>> optional, with a warning in the docs that it will guess based on the
>> first element and that you should specify it if you don't want that.
>> It seems potentially a bit error prone (in the sense that it might
>> make it easier to end up with code that works great when you test it
>> but then breaks later when something unexpected happens), but maybe
>> the usability outweighs that. I don't use fromiter myself so I don't
>> have a strong opinion.
>>
>> > btw, I think np.array(['f', 'o', 'o']) would be exactly the expected
>> result
>> > for np.array('foo'), but I guess that's just me.
>>
>> In general np.array(thing_that_can_go_inside_an_array) returns a
>> zero-dimensional (scalar) array -- np.array(1), np.array(True), etc.
>> all work like this, so I'd expect np.array("foo") to do the same.
>>
>> -n
>>
>> --
>> Nathaniel J. Smith -- http://vorpus.org
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151214/757e8ed6/attachment.html>


More information about the NumPy-Discussion mailing list