[Numpy-discussion] Nasty bug using pre-initialized arrays

Mon Jan 7 13:01:04 EST 2008

Charles R Harris wrote:
> 
> 
> On Jan 7, 2008 8:47 AM, Ryan May <rmay at ou.edu <mailto:rmay at ou.edu>> wrote:
> 
>     Stuart Brorson wrote:
>     >>> I realize NumPy != Matlab, but I'd wager that most users would think
>     >>> that this is the natural behavior......
>     >> Well, that behavior won't happen. We won't mutate the dtype of
>     the array because
>     >> of assignment. Matlab has copy(-on-write) semantics for things
>     like slices while
>     >> we have view semantics. We can't safely do the reallocation of
>     memory [1].
>     >
>     > That's fair enough.  But then I think NumPy should consistently
>     > typecheck all assignmetns and throw an exception if the user attempts
>     > an assignment which looses information.
>     >
> 
>     Yeah, there's no doubt in my mind that this is a bug, if for no other
>     reason than this inconsistency:
> 
> 
> One place where Numpy differs from MatLab is the way memory is handled.
> MatLab is always generating new arrays, so for efficiency it is worth
> preallocating arrays and then filling in the parts. This is not the case
> in Numpy where lists can be used for things that grow and subarrays are
> views. Consequently, preallocating arrays in Numpy should be rare and
> used when either the values have to be generated explicitly, which is
> what you see when using the indexes in your first example. As to
> assignment between arrays, it is a mixed question. The problem again is
> memory usage. For large arrays, it makes since to do automatic
> conversions, as is also the case in functions taking output arrays,
> because the typecast can be pushed down into C where it is time and
> space efficient, whereas explicitly converting the array uses up
> temporary space. However, I can imagine an explicit typecast function,
> something like
> 
> a[...] = typecast(b)
> 
> that would replace the current behavior. I think the typecast function
> could be implemented by returning a view of b with a castable flag set
> to true, that should supply enough information for the assignment
> operator to do its job. This might be a good addition for Numpy 1.1.

While that seems like an ok idea, I'm still not sure what's wrong with
raising an exception when there will be information loss.  The exception
is already raised with standard python complex objects.  I can think of
many times in my code where explicit looping is a necessity, so
pre-allocating the array is the only way to go.

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma