[Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice

Tue Aug 2 13:15:50 EDT 2011

On 8/2/11 8:38 AM, Jeremy Conlin wrote:
> Thanks, Brett. Using StringIO and numpy.loadtxt worked great. I'm
> still curious why what I was doing didn't work. Everything I can see
> indicates it should work.

In [11]: tfc_dtype
Out[11]: dtype([('nps', '>u8'), ('t', '>f8'), ('e', '>f8'), ('fom', '>f8')])

In [15]: n = numpy.fromstring(l, dtype=tfc_dtype, sep=' ')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/Users/cbarker/<ipython console> in <module>()

ValueError: don't know how to read character strings with that array type

means just what it says. In theory, numpy.fromstring() (and fromfile() ) 
provides a way to quickly and efficiently generate arrays from text, but 
it practice, the code is quite limited (and has a bug or two). I don't 
think anyone has gotten around to writing the code to use structured 
dtypes with it -- so it can't do what you want (rational though that 
expectation is)

In [21]: words
Out[21]: ['32000', '7.89131E-01', '8.05999E-03', '3.88222E+03']

In [22]: p =
Display all 249 possibilities? (y or n)

In [22]: p = numpy.array(words, dtype=tfc_dtype)

In [23]: p
Out[23]:
array([(3689064028291727360L, 0.0, 0.0, 0.0),
        (3976177339304456517L, 4.967820413490985e-91, 0.0, 0.0),
        (4048226120204106053L, 4.970217431784588e-91, 0.0, 0.0),
        (3687946958874489413L, 1.1572189237420885e-100, 0.0, 0.0)],
       dtype=[('nps', '>u8'), ('t', '>f8'), ('e', '>f8'), ('fom', '>f8')])

similarly here -- converting from text to structured dtypes is not fully 
supported

In [29]: a
Out[29]: [32000, 0.789131, 0.00805999, 3882.22]

In [30]: r = numpy.array(a)

In [31]: r
Out[31]:
array([  3.20000000e+04,   7.89131000e-01,   8.05999000e-03,
          3.88222000e+03])

sure -- numpy's default behavior is to find a dtype that will hold all 
the input array -- this pre-dates structured dtypes, and probably what 
you would want b default anyway.

In [32]: s = numpy.array(a, dtype=tfc_dtype)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/Users/cbarker/<ipython console> in <module>()

TypeError: expected a readable buffer object

OK -- I can see why you'd expect that to work. However, the trick with 
structured dtypes is that the dimensionality of the inputs can be less 
than obvious -- you are passing in a 1-d list of 4 numbers -- do you 
want a 1-d array? or ? -- in this case, it's pretty obvious (as a human) 
what you would want -- you have a dtype with four fields, and you're 
passing in four numbers, but there are so many possible combinations 
that numpy doesn't try to be "smart" about it. So as a rule, you need to 
be quite specific when working with structured dtypes.

However, the default is for numpy to map tuples to dtypes, so if you 
pass in a tuple instead, it works:

In [34]: t = tuple(a)

In [35]: s = numpy.array(t, dtype=tfc_dtype)

In [36]: s
Out[36]:
array((32000L, 0.789131, 0.00805999, 3882.22),
       dtype=[('nps', '>u8'), ('t', '>f8'), ('e', '>f8'), ('fom', '>f8')])

you were THIS close!

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov