[Numpy-discussion] fromiter cannot create array of object - was: Creating an ndarray from an iterable, over sequences
Sebastian Berg
sebastian at sipsolutions.net
Wed Jan 22 06:13:00 EST 2014
On Wed, 2014-01-22 at 07:58 +0100, Dr. Leo wrote:
> Hi,
>
> thanks. Both recarray and itertools.chain work just fine in the example
> case.
>
> However, the real purpose of this is to read strings from a large xml
> file into a pandas DataFrame. But fromiter cannot create arrays of dtype
> 'object'. Fixed length strings may be worth trying. But as the xml
> schema does not guarantee a max. length, and pandas generally uses
> 'object' arrays for strings, I see no better way than creating the array
> through list comprehensions and turn it into a DataFrame.
If your datatype is object, I doubt that using an intermediate list is a
real overhead, since the list will use much less memory then the string
objects anyway.
- Sebastian
>
> Maybe a variable length string/unicode type would help in the long term.
>
> Leo
>
>
> >
> > I would like to write something like:
> >
> > In [25]: iterable=((i, i**2) for i in range(10))
> >
> > In [26]: a=np.fromiter(iterable, int32)
> > ---------------------------------------------------------------------------
> > ValueError Traceback (most recent call
> > last)
> > <ipython-input-26-5bcc2e94dbca> in <module>()
> > ----> 1 a=np.fromiter(iterable, int32)
> >
> > ValueError: setting an array element with a sequence.
> >
> >
> > Is there an efficient way to do this?
> >
> Perhaps you could just utilize structured arrays (
> http://docs.scipy.org/doc/numpy/user/basics.rec.html), like:
> iterable= ((i, i**2) for i in range(10))
> a= np.fromiter(iterable, [('a', int32), ('b', int32)], 10)
> a.view(int32).reshape(-1, 2)
>
> You could use itertools:
>
> >>> from itertools import chain
> >>> g = ((i, i**2) for i in range(10))
> >>> import numpy
> >>> numpy.fromiter(chain.from_iterable(g), numpy.int32).reshape(-1, 2)
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list