[Numpy-discussion] returning recarray records as plain arrays

Matthew Koichi Grimes mkg at cs.nyu.edu
Wed Jan 3 15:39:51 EST 2007

```As per Stefan's help, I've made a subclass of recarray called nnvalue.
It just fixes the dtype to [('x', 'f8'), ('dx', 'f8'), ('delta', 'f8)],
and adds a few member functions. I basically treat nnvalue as a struct
with three equal-shaped array fields: x, dx, and delta.

I'd like it if, when I reference individual fields of a nnvalue, they
were returned as plain arrays. Instead, they are returned as nnarrays
with empty .x, .dx, and .deltas fields. These fields' dType is also
still set to [('x', 'f8'), ('dx', 'f8'), ('delta', 'f8)], when I wish
they'd just be 'f8'. The following illustrates the problem, followed by
the awkward workaround I'm currently using:

>>> nnv = nnvalue(3)        # a 3-element nnvalue
>>> nnv
nnvalue([(0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0)],
dtype=[('x', '<f8'), ('dx', '<f8'), ('delta', '<f8')])
>>> nnv.x
nnvalue([0.0, 0.0, 0.0],
dtype=[('x', '<f8'), ('dx', '<f8'), ('delta', '<f8')])
>>> nnv.x.view(dtype('f8'))
array([ 0.,  0.,  0.])

How can I get my recarray subclass to return individual records as plain
array views?

-- Matt

numpy-discussion-request at scipy.org wrote:
> Send Numpy-discussion mailing list submissions to
> 	numpy-discussion at scipy.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://projects.scipy.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
> 	numpy-discussion-request at scipy.org
>
> You can reach the person managing the list at
> 	numpy-discussion-owner at scipy.org
>
> than "Re: Contents of Numpy-discussion digest..."
>
> ------------------------------------------------------------------------
>
> Today's Topics:
>
>    1. The risks of empty() (Bock, Oliver BGI SYD)
>    2. Re: The risks of empty() (Robert Kern)
>    3. Re: The risks of empty() (Tim Hochberg)
>    4. Re: The risks of empty() (A. M. Archibald)
>    5. Re: newbie: attempt at data frame (Sven Schreiber)
>    6. subclassing float64 (and friends) (eric jones)
>    7. Re: newbie: attempt at data frame (Sven Schreiber)
>    8. Re: subclassing float64 (and friends) (Stefan van der Walt)
>
>
> ------------------------------------------------------------------------
>
> Subject:
> [Numpy-discussion] The risks of empty()
> From:
> "Bock, Oliver BGI SYD" <Oliver.Bock at barclaysglobal.com>
> Date:
> Wed, 3 Jan 2007 10:48:19 +1100
> To:
> <numpy-discussion at scipy.org>
>
> To:
> <numpy-discussion at scipy.org>
>
>
> Some of my arrays are not fully populated.  (I separately record which
> entries are valid.)  I want to use numpy.empty() to speed up the
> creation of these arrays, but I'm worried about what will happen if I
> apply operations to the entire contents of these arrays.  E.g.
>
> a + b
>
> I care about the results where valid entries align, but not otherwise.
> Given that numpy.empty() creates an ndarray using whatever junk it finds
> on the heap, it seems to me that there is the possibility that this
> could include bit patterns that are not valid floating point
> representations, which might raise floating point exceptions if used in
> operations like the one above (if they are "signalling" NaNs).  Will
> this be a problem, or will the results of operations on invalid floating
> point numbers yield NaN?
>
> Or to put it another way: do I need to ensure that array data is
> initialised before using it?
>
>
>    Oliver
>
>
>
> ------------------------------------------------------------------------
>
> Subject:
> Re: [Numpy-discussion] The risks of empty()
> From:
> Robert Kern <robert.kern at gmail.com>
> Date:
> Tue, 02 Jan 2007 18:04:13 -0600
> To:
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
> To:
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
>
> Bock, Oliver BGI SYD wrote:
>
>> Some of my arrays are not fully populated.  (I separately record which
>> entries are valid.)  I want to use numpy.empty() to speed up the
>> creation of these arrays, but I'm worried about what will happen if I
>> apply operations to the entire contents of these arrays.  E.g.
>>
>> a + b
>>
>> I care about the results where valid entries align, but not otherwise.
>> Given that numpy.empty() creates an ndarray using whatever junk it finds
>> on the heap, it seems to me that there is the possibility that this
>> could include bit patterns that are not valid floating point
>> representations, which might raise floating point exceptions if used in
>> operations like the one above (if they are "signalling" NaNs).  Will
>> this be a problem, or will the results of operations on invalid floating
>> point numbers yield NaN?
>>
>> Or to put it another way: do I need to ensure that array data is
>> initialised before using it?
>>
>
> You have essentially full control over floating point exceptions using seterr(),
> so you can silence even the signalling NaNs if you want.
>
>   olderrstate = seterr(all='ignore')
>   # Do stuff that might generate spurious warnings.
>   seterr(**olderrstate)
>
>
>
> ------------------------------------------------------------------------
>
> Subject:
> Re: [Numpy-discussion] The risks of empty()
> From:
> Tim Hochberg <tim.hochberg at ieee.org>
> Date:
> Tue, 02 Jan 2007 17:08:05 -0700
> To:
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
> To:
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
>
> Bock, Oliver BGI SYD wrote:
>> Some of my arrays are not fully populated.  (I separately record which
>> entries are valid.)  I want to use numpy.empty() to speed up the
>> creation of these arrays, but I'm worried about what will happen if I
>> apply operations to the entire contents of these arrays.  E.g.
>>
>> a + b
>>
>> I care about the results where valid entries align, but not otherwise.
>> Given that numpy.empty() creates an ndarray using whatever junk it finds
>> on the heap, it seems to me that there is the possibility that this
>> could include bit patterns that are not valid floating point
>> representations, which might raise floating point exceptions if used in
>> operations like the one above (if they are "signalling" NaNs).  Will
>> this be a problem, or will the results of operations on invalid floating
>> point numbers yield NaN?
>>
> This depends on what the error state is set to. You can set it to
> ignore floating point errors, in which case this will almost certainly
> work.
>
> However, why take the chance. Why not just build your arrays on top of
> zeros instead of empty? Most of the ways that I can think of filling
> in a sparse array are slow enough to overwhelm the extra overhead of
> zeros versus empty.
>> Or to put it another way: do I need to ensure that array data is
>> initialised before using it?
>>
> I think that this should work if you set the err state correctly (for
> example (seterr(all="ignore"). However, I don't like shutting down the
> error checking unless absolutely necessary, and overall it just seems
> better to initialize the arrays.
>
> -tim
>
>
>
> ------------------------------------------------------------------------
>
> Subject:
> Re: [Numpy-discussion] The risks of empty()
> From:
> "A. M. Archibald" <peridot.faceted at gmail.com>
> Date:
> Tue, 2 Jan 2007 20:12:24 -0400
> To:
> "Discussion of Numerical Python" <numpy-discussion at scipy.org>
>
> To:
> "Discussion of Numerical Python" <numpy-discussion at scipy.org>
>
>
> On 02/01/07, Bock, Oliver BGI SYD <Oliver.Bock at barclaysglobal.com> wrote:
>> Some of my arrays are not fully populated.  (I separately record which
>> entries are valid.)  I want to use numpy.empty() to speed up the
>> creation of these arrays, but I'm worried about what will happen if I
>> apply operations to the entire contents of these arrays.  E.g.
>>
>> a + b
>
> Have you looked at masked arrays? They are designed to do what you want.
>
>> I care about the results where valid entries align, but not otherwise.
>> Given that numpy.empty() creates an ndarray using whatever junk it finds
>> on the heap, it seems to me that there is the possibility that this
>> could include bit patterns that are not valid floating point
>> representations, which might raise floating point exceptions if used in
>> operations like the one above (if they are "signalling" NaNs).  Will
>> this be a problem, or will the results of operations on invalid floating
>> point numbers yield NaN?
>
> There is indeed the possibility. Even with floating-point exceptions
> turned off, on some machines (e.g., Pentium Ms) NaNs are extremely
> slow to calculate with (because they are handled in software). I'm not
> sure that there *are* bit patterns that are not valid floating-point
> numbers, but in any case while using empty does not in practice seem
> to lead to trouble, you could have some surprising slowdowns if the
> array happens to be filled with NaNs.
>
> I recommend using masked arrays, which have the further advantage that
> values in invalid ("masked") entries are not computed at all. (If your
> invalid entries were few or arose naturally or you use (say) Opterons,
> I might recommend using NaNs to mark invalid entries.)
>
>> Or to put it another way: do I need to ensure that array data is
>> initialised before using it?
>
> It does not seem to be a problem in practice, but there are tools to
> help with what you want to do.
>
> A. M. Archibald
>
>
> ------------------------------------------------------------------------
>
> Subject:
> Re: [Numpy-discussion] newbie: attempt at data frame
> From:
> Sven Schreiber <svetosch at gmx.net>
> Date:
> Wed, 03 Jan 2007 10:00:10 +0100
> To:
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
> To:
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
>
> Vincent Nijs schrieb:
>
>
>> If there is an easy way to read array data + variable names using the csv
>> module it would be great if that could be added to cookbook/InputOutput. I
>> couldn't figure out how to do it.
>>
>>
>>
>
> Hi Vincent, of course it depends a little on how exactly your csv file
> looks like, but if you just have column headers and the actual data, you
> might try something like the following:
>
> import csv
> from numpy import mat	# or array if you like
> obslist = []
> datalist = []
>     obslist.append(line[0])
>     datalist.append(line[1:])
> # (datalist should now be a nested list, first index rows, second
> #  columns)
> # (still contains the headers)
> varnames = datalist.pop(0)
> # now the real data
> data = mat(datalist, dtype = float)
>
> -sven
>
>
>
> ------------------------------------------------------------------------
>
> Subject:
> [Numpy-discussion] subclassing float64 (and friends)
> From:
> eric jones <eric at enthought.com>
> Date:
> Wed, 03 Jan 2007 04:29:10 -0600
> To:
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
> To:
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
>
> Hey all,
>
> I am playing around with sub-classing the new-fangled float64 objects
> and friends.  I really like the new ndarray subclassing features
> (__array_finalize__, etc.), and was exploring whether or not the
> scalars worked the same way.  I've stubbed my toe right out of the
> blocks though.  I can sub-class from standard python floats just fine,
> but when I try to do the same from float64, I get a traceback.
> (examples below)  Anyone have ideas on how to do this correctly?
>
> thanks,
> eric
>
> class MyFloat(float):
>
>    def __new__(cls, data, my_attr=None):
>        obj = float.__new__(cls, data)
>        obj.my_attr = my_attr
>        return obj
>
> a = MyFloat(1.2,my_attr="hello")
> print a, a.my_attr
>
> output:
>    1.2 hello
>
>
> from numpy import float64
>
> class MyFloat2(float64):
>
>    def __new__(cls, data, my_attr=None):
>        obj = float64.__new__(cls, data)
>        obj.my_attr = my_attr
>        return obj
>
> a = MyFloat2(1.2,my_attr="hello")
> print a, a.my_attr
>
>
> output:
>    Traceback (most recent call last):
>      File "C:\wrk\eric\trunk\src\lib\geo\examples\scalar_subtype.py",
> line 33, in ?
>        a = MyFloat2(1.2,my_attr="hello")
>      File "C:\wrk\eric\trunk\src\lib\geo\examples\scalar_subtype.py",
> line 30, in __new__
>        obj.my_attr = my_attr
>    AttributeError: 'numpy.float64' object has no attribute 'my_attr'
>
>
> ------------------------------------------------------------------------
>
> Subject:
> Re: [Numpy-discussion] newbie: attempt at data frame
> From:
> Sven Schreiber <svetosch at gmx.net>
> Date:
> Wed, 03 Jan 2007 12:07:15 +0100
> To:
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
> To:
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
>
> Sven Schreiber schrieb:
>
>
>> Hi Vincent, of course it depends a little on how exactly your csv file
>> looks like, but if you just have column headers and the actual data, you
>> might try something like the following:
>>
>>
>
> Ok sorry the previous thing doesn't work, I also stumbled over the
> strings. Here's the next attempt, also shorter. (this time even tested ;-)
>
> import csv
> from numpy import mat
>
> stringlist = [ line for line in read_from ]
> varnames = stringlist.pop(0)[1:]
> datalist = [ map(float, line[1:]) for line in stringlist ]
>
> # now the real data
> data = mat(datalist, dtype = float)
>
>
> I actually quite like it... python lists are very nice. This discards
> the observation labels, but it's not difficult to add that, of course.
>
> -sven
>
>
>
> ------------------------------------------------------------------------
>
> Subject:
> Re: [Numpy-discussion] subclassing float64 (and friends)
> From:
> Stefan van der Walt <stefan at sun.ac.za>
> Date:
> Wed, 3 Jan 2007 14:44:45 +0200
> To:
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
> To:
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
>
> On Wed, Jan 03, 2007 at 04:29:10AM -0600, eric jones wrote:
>
>> I am playing around with sub-classing the new-fangled float64 objects
>> and friends.  I really like the new ndarray subclassing features
>> (__array_finalize__, etc.), and was exploring whether or not the scalars
>> worked the same way.  I've stubbed my toe right out of the blocks
>> though.  I can sub-class from standard python floats just fine, but when
>> I try to do the same from float64, I get a traceback. (examples below)
>> Anyone have ideas on how to do this correctly?
>>
>> from numpy import float64
>>
>> class MyFloat2(float64):
>>
>>     def __new__(cls, data, my_attr=None):
>>         obj = float64.__new__(cls, data)
>>         obj.my_attr = my_attr
>>         return obj
>>
>> a = MyFloat2(1.2,my_attr="hello")
>> print a, a.my_attr
>>
>>
>> output:
>>     Traceback (most recent call last):
>>       File "C:\wrk\eric\trunk\src\lib\geo\examples\scalar_subtype.py",
>> line 33, in ?
>>         a = MyFloat2(1.2,my_attr="hello")
>>       File "C:\wrk\eric\trunk\src\lib\geo\examples\scalar_subtype.py",
>> line 30, in __new__
>>         obj.my_attr = my_attr
>>     AttributeError: 'numpy.float64' object has no attribute 'my_attr'
>>
>
> With classes defined in C I've often noticed that you can't add
> attributes, i.e.
>
> f = N.float64(1.2)
> f.x = 1
>
> breaks with
>
> AttributeError: 'numpy.float64' object has no attribute 'x'
>
> The way to fix this for arrays is to first view the array as the new
> subclass, i.e.
>
> x = N.array([1])
> class Ary(N.ndarray):
>     pass
> x = x.view(Ary)
> x.y = 1
>
> However, with floats I noticed that the following fails:
>
> import numpy as N
> f = N.float64(1.2)
> class Floaty(N.float64):
>     pass
> f.view(Floaty)
>
> with
>
> TypeError: data type cannot be determined from type object
>
> Maybe this is part of the problem?
>
> Regards
> Stéfan
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>

```