[Numpy-discussion] The risks of empty()

Tue Jan 2 19:12:24 EST 2007

On 02/01/07, Bock, Oliver BGI SYD <Oliver.Bock at barclaysglobal.com> wrote:
> Some of my arrays are not fully populated.  (I separately record which
> entries are valid.)  I want to use numpy.empty() to speed up the
> creation of these arrays, but I'm worried about what will happen if I
> apply operations to the entire contents of these arrays.  E.g.
>
> a + b

Have you looked at masked arrays? They are designed to do what you want.

> I care about the results where valid entries align, but not otherwise.
> Given that numpy.empty() creates an ndarray using whatever junk it finds
> on the heap, it seems to me that there is the possibility that this
> could include bit patterns that are not valid floating point
> representations, which might raise floating point exceptions if used in
> operations like the one above (if they are "signalling" NaNs).  Will
> this be a problem, or will the results of operations on invalid floating
> point numbers yield NaN?

There is indeed the possibility. Even with floating-point exceptions
turned off, on some machines (e.g., Pentium Ms) NaNs are extremely
slow to calculate with (because they are handled in software). I'm not
sure that there *are* bit patterns that are not valid floating-point
numbers, but in any case while using empty does not in practice seem
to lead to trouble, you could have some surprising slowdowns if the
array happens to be filled with NaNs.

I recommend using masked arrays, which have the further advantage that
values in invalid ("masked") entries are not computed at all. (If your
invalid entries were few or arose naturally or you use (say) Opterons,
I might recommend using NaNs to mark invalid entries.)

> Or to put it another way: do I need to ensure that array data is
> initialised before using it?

It does not seem to be a problem in practice, but there are tools to
help with what you want to do.

A. M. Archibald