fromiter cannot create array of object - was: Creating an ndarray from an iterable, over sequences
Hi, thanks. Both recarray and itertools.chain work just fine in the example case. However, the real purpose of this is to read strings from a large xml file into a pandas DataFrame. But fromiter cannot create arrays of dtype 'object'. Fixed length strings may be worth trying. But as the xml schema does not guarantee a max. length, and pandas generally uses 'object' arrays for strings, I see no better way than creating the array through list comprehensions and turn it into a DataFrame. Maybe a variable length string/unicode type would help in the long term. Leo
I would like to write something like:
In [25]: iterable=((i, i**2) for i in range(10))
In [26]: a=np.fromiter(iterable, int32) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-26-5bcc2e94dbca> in <module>() ----> 1 a=np.fromiter(iterable, int32)
ValueError: setting an array element with a sequence.
Is there an efficient way to do this?
Perhaps you could just utilize structured arrays ( http://docs.scipy.org/doc/numpy/user/basics.rec.html), like: iterable= ((i, i**2) for i in range(10)) a= np.fromiter(iterable, [('a', int32), ('b', int32)], 10) a.view(int32).reshape(-1, 2) You could use itertools:
from itertools import chain g = ((i, i**2) for i in range(10)) import numpy numpy.fromiter(chain.from_iterable(g), numpy.int32).reshape(-1, 2)
On Wed, 2014-01-22 at 07:58 +0100, Dr. Leo wrote:
Hi,
thanks. Both recarray and itertools.chain work just fine in the example case.
However, the real purpose of this is to read strings from a large xml file into a pandas DataFrame. But fromiter cannot create arrays of dtype 'object'. Fixed length strings may be worth trying. But as the xml schema does not guarantee a max. length, and pandas generally uses 'object' arrays for strings, I see no better way than creating the array through list comprehensions and turn it into a DataFrame.
If your datatype is object, I doubt that using an intermediate list is a real overhead, since the list will use much less memory then the string objects anyway. - Sebastian
Maybe a variable length string/unicode type would help in the long term.
Leo
I would like to write something like:
In [25]: iterable=((i, i**2) for i in range(10))
In [26]: a=np.fromiter(iterable, int32) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-26-5bcc2e94dbca> in <module>() ----> 1 a=np.fromiter(iterable, int32)
ValueError: setting an array element with a sequence.
Is there an efficient way to do this?
Perhaps you could just utilize structured arrays ( http://docs.scipy.org/doc/numpy/user/basics.rec.html), like: iterable= ((i, i**2) for i in range(10)) a= np.fromiter(iterable, [('a', int32), ('b', int32)], 10) a.view(int32).reshape(-1, 2)
You could use itertools:
from itertools import chain g = ((i, i**2) for i in range(10)) import numpy numpy.fromiter(chain.from_iterable(g), numpy.int32).reshape(-1, 2)
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (2)
-
Dr. Leo
-
Sebastian Berg