
Some of my arrays are not fully populated. (I separately record which entries are valid.) I want to use numpy.empty() to speed up the creation of these arrays, but I'm worried about what will happen if I apply operations to the entire contents of these arrays. E.g. a + b I care about the results where valid entries align, but not otherwise. Given that numpy.empty() creates an ndarray using whatever junk it finds on the heap, it seems to me that there is the possibility that this could include bit patterns that are not valid floating point representations, which might raise floating point exceptions if used in operations like the one above (if they are "signalling" NaNs). Will this be a problem, or will the results of operations on invalid floating point numbers yield NaN? Or to put it another way: do I need to ensure that array data is initialised before using it? Oliver -- This message and any attachments are confidential, proprietary, and may be privileged. If this message was misdirected, Barclays Global Investors (BGI) does not waive any confidentiality or privilege. If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. The views and opinions expressed in this e-mail message are the author's own and may not reflect the views and opinions of BGI, unless the author is authorized by BGI to express such views or opinions on its behalf. All email sent to or from this address is subject to electronic storage and review by BGI. Although BGI operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.

Bock, Oliver BGI SYD wrote:
Some of my arrays are not fully populated. (I separately record which entries are valid.) I want to use numpy.empty() to speed up the creation of these arrays, but I'm worried about what will happen if I apply operations to the entire contents of these arrays. E.g.
a + b
I care about the results where valid entries align, but not otherwise. Given that numpy.empty() creates an ndarray using whatever junk it finds on the heap, it seems to me that there is the possibility that this could include bit patterns that are not valid floating point representations, which might raise floating point exceptions if used in operations like the one above (if they are "signalling" NaNs). Will this be a problem, or will the results of operations on invalid floating point numbers yield NaN?
Or to put it another way: do I need to ensure that array data is initialised before using it?
You have essentially full control over floating point exceptions using seterr(), so you can silence even the signalling NaNs if you want. olderrstate = seterr(all='ignore') # Do stuff that might generate spurious warnings. seterr(**olderrstate) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Bock, Oliver BGI SYD wrote:
Some of my arrays are not fully populated. (I separately record which entries are valid.) I want to use numpy.empty() to speed up the creation of these arrays, but I'm worried about what will happen if I apply operations to the entire contents of these arrays. E.g.
a + b
I care about the results where valid entries align, but not otherwise. Given that numpy.empty() creates an ndarray using whatever junk it finds on the heap, it seems to me that there is the possibility that this could include bit patterns that are not valid floating point representations, which might raise floating point exceptions if used in operations like the one above (if they are "signalling" NaNs). Will this be a problem, or will the results of operations on invalid floating point numbers yield NaN?
This depends on what the error state is set to. You can set it to ignore floating point errors, in which case this will almost certainly work. However, why take the chance. Why not just build your arrays on top of zeros instead of empty? Most of the ways that I can think of filling in a sparse array are slow enough to overwhelm the extra overhead of zeros versus empty.
Or to put it another way: do I need to ensure that array data is initialised before using it?
I think that this should work if you set the err state correctly (for example (seterr(all="ignore"). However, I don't like shutting down the error checking unless absolutely necessary, and overall it just seems better to initialize the arrays. -tim

On 02/01/07, Bock, Oliver BGI SYD <Oliver.Bock@barclaysglobal.com> wrote:
Some of my arrays are not fully populated. (I separately record which entries are valid.) I want to use numpy.empty() to speed up the creation of these arrays, but I'm worried about what will happen if I apply operations to the entire contents of these arrays. E.g.
a + b
Have you looked at masked arrays? They are designed to do what you want.
I care about the results where valid entries align, but not otherwise. Given that numpy.empty() creates an ndarray using whatever junk it finds on the heap, it seems to me that there is the possibility that this could include bit patterns that are not valid floating point representations, which might raise floating point exceptions if used in operations like the one above (if they are "signalling" NaNs). Will this be a problem, or will the results of operations on invalid floating point numbers yield NaN?
There is indeed the possibility. Even with floating-point exceptions turned off, on some machines (e.g., Pentium Ms) NaNs are extremely slow to calculate with (because they are handled in software). I'm not sure that there *are* bit patterns that are not valid floating-point numbers, but in any case while using empty does not in practice seem to lead to trouble, you could have some surprising slowdowns if the array happens to be filled with NaNs. I recommend using masked arrays, which have the further advantage that values in invalid ("masked") entries are not computed at all. (If your invalid entries were few or arose naturally or you use (say) Opterons, I might recommend using NaNs to mark invalid entries.)
Or to put it another way: do I need to ensure that array data is initialised before using it?
It does not seem to be a problem in practice, but there are tools to help with what you want to do. A. M. Archibald
participants (4)
-
A. M. Archibald
-
Bock, Oliver BGI SYD
-
Robert Kern
-
Tim Hochberg