[Numpy-discussion] Convert recarray to list (is this a bug?)

Yan Tang tang.yan at gmail.com
Tue Jul 10 09:53:52 EDT 2012


Thank you very much.

On Tue, Jul 10, 2012 at 3:02 AM, Travis Oliphant <travis at continuum.io>wrote:

>
> On Jul 9, 2012, at 9:24 PM, Yan Tang wrote:
>
> Hi,
>
> I noticed there is an odd issue when I am trying to convert a recarray to
> list.  See below for the example/test case.
>
> $ cat a.csv
> date,count
> 2011-07-25,91
> 2011-07-26,118
> $ cat b.csv
> name,count
> foo,1233
> bar,100
>
> $ python
>
> >>> from matplotlib import mlab
> >>> import numpy as np
>
> >>> a = mlab.csv2rec('a.csv')
> >>> b = mlab.csv2rec('b.csv')
> >>> a
> rec.array([(datetime.date(2011, 7, 25), 91), (datetime.date(2011, 7, 26),
> 118)],
>       dtype=[('date', '|O8'), ('count', '<i8')])
> >>> b
> rec.array([('foo', 1233), ('bar', 100)],
>       dtype=[('name', '|S3'), ('count', '<i8')])
>
>
> >>> np.array(a.tolist()).tolist()
> [[datetime.date(2011, 7, 25), 91], [datetime.date(2011, 7, 26), 118]]
> >>> np.array(b.tolist()).tolist()
> [['foo', '1233'], ['bar', '100']]
>
>
> The odd case is, 1233 becomes a string '1233' in the second command.  But
> 91 is still a number 91.
>
> Why would this happen?  What's the correct way to do this conversion?
>
>
> You are trying to convert the record array into a list of lists, I
> presume?   The tolist() method on the rec.array produces a list of tuples.
>   Be sure that a list of tuples does not actually satisfy your requirements
> --- it might.
>
> Passing this back to np.array is going to try to come up with a data-type
> that satisfies all the elements in the list of tuples.  You are relying
> here on np.array's "intelligence" for trying to figure out what kind of
> array you have.   It tries to do it's best, but it is limited to
> determining a "primitive" data-type (float, int, string, object).   It
> can't always predict what you expect --- especially when the original data
> source was a record like this.    In the first case, because of the
> date-time object, it decides the data is an "object" array which works.  In
> the second it decides that the data can all be represented as a "string"
> and so choose that.   The second .tolist() just produces a list out of the
> 2-d array.
>
> Likely what you want to do is just create a list of lists from the
> original output of .tolist.   Like this:
>
> [list(x) for x in a.tolist()]
> [list(x) for x in b.tolist()]
>
> This wil be faster as well...
>
> Best,
>
> -Travis
>
>
>
>
>
>
>
>
>
> Thanks.
>
> -uris-
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120710/f48d2cd0/attachment.html>


More information about the NumPy-Discussion mailing list