fromiter cannot create array of object  was: Creating an ndarray from an iterable, over sequences
Hi,
thanks. Both recarray and itertools.chain work just fine in the example case.
However, the real purpose of this is to read strings from a large xml file into a pandas DataFrame. But fromiter cannot create arrays of dtype 'object'. Fixed length strings may be worth trying. But as the xml schema does not guarantee a max. length, and pandas generally uses 'object' arrays for strings, I see no better way than creating the array through list comprehensions and turn it into a DataFrame.
Maybe a variable length string/unicode type would help in the long term.
Leo
I would like to write something like:
In [25]: iterable=((i, i**2) for i in range(10))
In [26]: a=np.fromiter(iterable, int32)
ValueError Traceback (most recent call last) <ipythoninput265bcc2e94dbca> in <module>() > 1 a=np.fromiter(iterable, int32)
ValueError: setting an array element with a sequence.
Is there an efficient way to do this?
Perhaps you could just utilize structured arrays ( http://docs.scipy.org/doc/numpy/user/basics.rec.html), like: iterable= ((i, i**2) for i in range(10)) a= np.fromiter(iterable, [('a', int32), ('b', int32)], 10) a.view(int32).reshape(1, 2)
You could use itertools:
from itertools import chain g = ((i, i**2) for i in range(10)) import numpy numpy.fromiter(chain.from_iterable(g), numpy.int32).reshape(1, 2)
On Wed, 20140122 at 07:58 +0100, Dr. Leo wrote:
Hi,
thanks. Both recarray and itertools.chain work just fine in the example case.
However, the real purpose of this is to read strings from a large xml file into a pandas DataFrame. But fromiter cannot create arrays of dtype 'object'. Fixed length strings may be worth trying. But as the xml schema does not guarantee a max. length, and pandas generally uses 'object' arrays for strings, I see no better way than creating the array through list comprehensions and turn it into a DataFrame.
If your datatype is object, I doubt that using an intermediate list is a real overhead, since the list will use much less memory then the string objects anyway.
 Sebastian
Maybe a variable length string/unicode type would help in the long term.
Leo
I would like to write something like:
In [25]: iterable=((i, i**2) for i in range(10))
In [26]: a=np.fromiter(iterable, int32)
ValueError Traceback (most recent call last) <ipythoninput265bcc2e94dbca> in <module>() > 1 a=np.fromiter(iterable, int32)
ValueError: setting an array element with a sequence.
Is there an efficient way to do this?
Perhaps you could just utilize structured arrays ( http://docs.scipy.org/doc/numpy/user/basics.rec.html), like: iterable= ((i, i**2) for i in range(10)) a= np.fromiter(iterable, [('a', int32), ('b', int32)], 10) a.view(int32).reshape(1, 2)
You could use itertools:
from itertools import chain g = ((i, i**2) for i in range(10)) import numpy numpy.fromiter(chain.from_iterable(g), numpy.int32).reshape(1, 2)
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
participants (2)

Dr. Leo

Sebastian Berg