[Numpy-discussion] fromiter cannot create array of object - was: Creating an ndarray from an iterable, over sequences

Dr. Leo fhaxbox66 at googlemail.com
Wed Jan 22 01:58:27 EST 2014


Hi,

thanks. Both recarray and itertools.chain work just fine in the example
case.

However, the real purpose of this is to read strings from a large xml
file into a pandas DataFrame. But fromiter cannot create arrays of dtype
'object'. Fixed length strings may be worth trying. But as the xml
schema does not guarantee a max. length, and pandas generally uses
'object' arrays for strings, I see no better way than creating the array
through list comprehensions and turn it into a DataFrame.

Maybe a variable length string/unicode type would help in the long term.

Leo


>
> I would like to write something like:
>
> In [25]: iterable=((i, i**2) for i in range(10))
>
> In [26]: a=np.fromiter(iterable, int32)
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call
> last)
> <ipython-input-26-5bcc2e94dbca> in <module>()
> ----> 1 a=np.fromiter(iterable, int32)
>
> ValueError: setting an array element with a sequence.
>
>
> Is there an efficient way to do this?
>
Perhaps you could just utilize structured arrays (
http://docs.scipy.org/doc/numpy/user/basics.rec.html), like:
iterable= ((i, i**2) for i in range(10))
a= np.fromiter(iterable, [('a', int32), ('b', int32)], 10)
a.view(int32).reshape(-1, 2)

You could use itertools:

>>> from itertools import chain
>>> g = ((i, i**2) for i in range(10))
>>> import numpy
>>> numpy.fromiter(chain.from_iterable(g), numpy.int32).reshape(-1, 2)



More information about the NumPy-Discussion mailing list