
NumPy gurus --
I just discovered this today. It looks like a bug to me. Please flame me mercilessly if I am wrong! :-)
Sometimes you need to initialize an array using zeros() before doing an assignment to it in a loop. If you assign a complex value to the initialized array, the imaginary part of the array is dropped. Does NumPy do a silent type-cast which causes this behavior? Is this typecast a feature?
Below I attach a session log showing the bug. Note that I have boiled down my complex code to this simple case for ease of comprehension. [1] I will also input this bug into the tracking system.
By the way, this is NumPy 1.0.4:
In [39]: numpy.__version__ Out[39]: '1.0.4'
Cheers,
Stuart Brorson Interactive Supercomputing, inc. 135 Beaver Street | Waltham | MA | 02452 | USA http://www.interactivesupercomputing.com/
---------------------- <session log> --------------------
In [29]: A = numpy.random.rand(4) + 1j*numpy.random.rand(4)
In [30]: B = numpy.zeros((4))
In [31]:
In [31]: for i in range(4): ....: B[i] = A[i] ....:
In [32]: A Out[32]: array([ 0.12150180+0.00577893j, 0.39792515+0.03607227j, 0.61933379+0.04506978j, 0.56751678+0.24576083j])
In [33]: B Out[33]: array([ 0.1215018 , 0.39792515, 0.61933379, 0.56751678])
----------------------- </session log> -------------------
[1] Yes, I know that I should use vectorized code as often as possible, and that this example is not vectorized. This is a simple example illustrating the problem. Moreover, many times the computation you wish to perform can't easily be vectorized, leaving the nasty old for loop as the only choice......

On Friday 04 January 2008 16:08:54 Stuart Brorson wrote:
Sometimes you need to initialize an array using zeros() before doing an assignment to it in a loop. If you assign a complex value to the initialized array, the imaginary part of the array is dropped. Does NumPy do a silent type-cast which causes this behavior? Is this typecast a feature?
By default, empty arrays are initialized as float. If you try to force a complex into it, yes, the imaginary part will be dropped. Use the dtype argument of zeros or ones to specify the type of your array, for example:
B = numpy.zeros((4,), dtype=numpy.complex_)

Sometimes you need to initialize an array using zeros() before doing an assignment to it in a loop. If you assign a complex value to the initialized array, the imaginary part of the array is dropped. Does NumPy do a silent type-cast which causes this behavior? Is this typecast a feature?
By default, empty arrays are initialized as float. If you try to force a complex into it, yes, the imaginary part will be dropped. Use the dtype argument of zeros or ones to specify the type of your array, for example:
B = numpy.zeros((4,), dtype=numpy.complex_)
Hmmmm, I'll try this. Thanks!
I still think that dumb users like me are more likely to assume that the destination array will become complex when you assign complex to it. Is there an articulated philosophy of how assignments (and any accompanying typecasting) should work in Numpy?
Cheers,
Stuart Brorson Interactive Supercomputing, inc. 135 Beaver Street | Waltham | MA | 02452 | USA http://www.interactivesupercomputing.com/

I just discovered this today. It looks like a bug to me. Please flame me mercilessly if I am wrong! :-)
Hmmmm.... after a little more playing around, I think it's indeed true that NumPy does a typecast to make the resulting assignment have the same type as the LHS, regardless of the type of the RHS. Below I attach another example, which shows this behavior.
As a naive user, I would not expect that my variables would get silently typecast in an assignment like this. But is this behavior Pythonic? I'm not sure..... Normally, the Pythonic thing to do when assigning non-conforming variables is to barf -- throw an exception. At least that's what I would expect.
Comments?
Stuart Brorson Interactive Supercomputing, inc. 135 Beaver Street | Waltham | MA | 02452 | USA http://www.interactivesupercomputing.com/
----------------------- <session log> ---------------------
In [77]: A = 10*numpy.random.rand(4)
In [78]: B = numpy.zeros((4))
In [79]: B.dtype='int64'
In [80]:
In [80]: for i in range(4): ....: B[i] = A[i] ....:
In [81]: A Out[81]: array([ 1.71327285, 3.48128855, 7.51404178, 8.96775254])
In [82]: B Out[82]: array([1, 3, 7, 8], dtype=int64)
----------------------- </session log> ---------------------

On Fri, 4 Jan 2008, Stuart Brorson wrote:
I just discovered this today. It looks like a bug to me. Please flame me mercilessly if I am wrong! :-)
FWIW, here's what Matlab does:
A = rand(1, 4) + rand(1, 4)*i
A =
Columns 1 through 3
0.7833 + 0.7942i 0.6808 + 0.0592i 0.4611 + 0.6029i
Column 4
0.5678 + 0.0503i
B = zeros(1, 4)
B =
0 0 0 0
for idx=1:4; B(idx) = A(idx); end B
B =
Columns 1 through 3
0.7833 + 0.7942i 0.6808 + 0.0592i 0.4611 + 0.6029i
Column 4
0.5678 + 0.0503i
I realize NumPy != Matlab, but I'd wager that most users would think that this is the natural behavior......
Stuart Brorson Interactive Supercomputing, inc. 135 Beaver Street | Waltham | MA | 02452 | USA http://www.interactivesupercomputing.com/

On Fri, 4 Jan 2008, Stuart Brorson apparently wrote:
I realize NumPy != Matlab, but I'd wager that most users would think that this is the natural behavior.
I would not find it "natural" that elements of my float array could be assigned complex values. How could it be a fixed chunk of memory and do such things, unless it would always waste enough memory to hold the biggest possible subsequent data type?
Cheers, Alan Isaac (user)

I realize NumPy != Matlab, but I'd wager that most users would think that this is the natural behavior.
I would not find it "natural" that elements of my float array could be assigned complex values.
OK, then NumPy should throw an exception if you try to make the assignemnt.
I tried it out. NumPy does the right thing in this case:
In [10]: A = numpy.zeros([3, 3])
In [11]: A[1, 1] = 1+2j --------------------------------------------------------------------------- <type 'exceptions.TypeError'> Traceback (most recent call last)
/fs/home/sdb/<ipython console> in <module>()
<type 'exceptions.TypeError'>: can't convert complex to float; use abs(z)
However, not in this case:
In [12]: B = 10*numpy.random.rand(3, 3) + 1j*numpy.random.rand(3, 3)
In [13]: B[1, 1] Out[13]: (7.15013107181+0.383220559014j)
In [14]: A[1, 1] = B[1, 1]
In [15]:
So the bug is that assignment doesn't do value checking in every case.
How could it be a fixed chunk of memory and do such things, unless it would always waste enough memory to hold the biggest possible subsequent data type?
Fair question. Matlab does a realloc if you assign complex values to an initially real array. It can take a long time if your matrices are large.
Cheers,
Stuart Brorson Interactive Supercomputing, inc. 135 Beaver Street | Waltham | MA | 02452 | USA http://www.interactivesupercomputing.com/

Stuart Brorson wrote:
On Fri, 4 Jan 2008, Stuart Brorson wrote:
I just discovered this today. It looks like a bug to me. Please flame me mercilessly if I am wrong! :-)
FWIW, here's what Matlab does:
A = rand(1, 4) + rand(1, 4)*i
A =
Columns 1 through 3
0.7833 + 0.7942i 0.6808 + 0.0592i 0.4611 + 0.6029i
Column 4
0.5678 + 0.0503i
B = zeros(1, 4)
B =
0 0 0 0
for idx=1:4; B(idx) = A(idx); end B
B =
Columns 1 through 3
0.7833 + 0.7942i 0.6808 + 0.0592i 0.4611 + 0.6029i
Column 4
0.5678 + 0.0503i
I realize NumPy != Matlab, but I'd wager that most users would think that this is the natural behavior......
Well, that behavior won't happen. We won't mutate the dtype of the array because of assignment. Matlab has copy(-on-write) semantics for things like slices while we have view semantics. We can't safely do the reallocation of memory [1].
[1] Well, we do have a .resize() method which will do the reallocation and raise an exception if there are views lingering about. However, this is only done when explicitly asked for because this is a feature that is useful in a limited number of situations. We will not allow it to be done implicitly.

I realize NumPy != Matlab, but I'd wager that most users would think that this is the natural behavior......
Well, that behavior won't happen. We won't mutate the dtype of the array because of assignment. Matlab has copy(-on-write) semantics for things like slices while we have view semantics. We can't safely do the reallocation of memory [1].
That's fair enough. But then I think NumPy should consistently typecheck all assignmetns and throw an exception if the user attempts an assignment which looses information.
If you point me to a file where assignments are done (particularly from array elements to array elements) I can see if I can figure out how to fix it & then submit a patch. But I won't promise anything! My brain hurts already after analyzing this "feature"..... :-)
Cheers,
Stuart Brorson Interactive Supercomputing, inc. 135 Beaver Street | Waltham | MA | 02452 | USA http://www.interactivesupercomputing.com/

On Friday 04 January 2008 05:17:56 pm Stuart Brorson wrote:
I realize NumPy != Matlab, but I'd wager that most users would think that this is the natural behavior......
Well, that behavior won't happen. We won't mutate the dtype of the array because of assignment. Matlab has copy(-on-write) semantics for things like slices while we have view semantics. We can't safely do the reallocation of memory [1].
That's fair enough. But then I think NumPy should consistently typecheck all assignmetns and throw an exception if the user attempts an assignment which looses information.
If you point me to a file where assignments are done (particularly from array elements to array elements) I can see if I can figure out how to fix it & then submit a patch. But I won't promise anything! My brain hurts already after analyzing this "feature"..... :-)
There is a long history in numeric/numarray/numpy about this "feature". And for many of us, it really is a feature -- it prevents the automatic upcasting of arrays, which is very important if your arrays are huge (i.e. comparable in size to your system memory).
For instance in astronomy, where very large 16-bit integer or 32-bit float images or data-cubes are common, if you upcast your 32-bit floats accidentally because you are doing double precision math (i.e. the default in Python) near them, that can cause the program to swap out or die horribly. In fact, this exact example is one of the reasons why the Space Telescope people initially developed numarray. numpy has kept that model. I agree, though, that when using very mixed types (i.e. complex and ints, for example), the results can be confusing.
Scott

On Jan 4, 2008 3:28 PM, Scott Ransom sransom@nrao.edu wrote:
On Friday 04 January 2008 05:17:56 pm Stuart Brorson wrote:
I realize NumPy != Matlab, but I'd wager that most users would think that this is the natural behavior......
Well, that behavior won't happen. We won't mutate the dtype of the array because of assignment. Matlab has copy(-on-write) semantics for things like slices while we have view semantics. We can't safely do the reallocation of memory [1].
That's fair enough. But then I think NumPy should consistently typecheck all assignmetns and throw an exception if the user attempts an assignment which looses information.
If you point me to a file where assignments are done (particularly from array elements to array elements) I can see if I can figure out how to fix it & then submit a patch. But I won't promise anything! My brain hurts already after analyzing this "feature"..... :-)
There is a long history in numeric/numarray/numpy about this "feature". And for many of us, it really is a feature -- it prevents the automatic upcasting of arrays, which is very important if your arrays are huge (i.e. comparable in size to your system memory).
For instance in astronomy, where very large 16-bit integer or 32-bit float images or data-cubes are common, if you upcast your 32-bit floats accidentally because you are doing double precision math (i.e. the default in Python) near them, that can cause the program to swap out or die horribly. In fact, this exact example is one of the reasons why the Space Telescope people initially developed numarray. numpy has kept that model. I agree, though, that when using very mixed types (i.e. complex and ints, for example), the results can be confusing.
This isn't a very compelling argument in this case. The concern the numarray people were addressing was the upcasting of precision. However, there are two related hierarchies in numpy, one is the kind[1] of data, roughly: bool, int, float, complex. Each kind has various precisions. The numarray folks were concerned with avoiding upcasting of precision, not with avoiding upcasting up kinds. And, I can't see much (any?) justification for allowing automagic downcasting of kind, complex->float being the most egregious, other than backwards compatibility. This is clearly an opportunity for confusion and likely a magnet for bug. And, I've yet to see any useful examples to support this behaviour. I imagine that their are some benifits, but I doubt that they are compelling enough to justify the current behaviour.
[1] I can't recall if this is the official terminology; I'm away from my home computer at the moment and it's hard for me to check. The idea should be clear however,

On Fri, Jan 04, 2008 at 04:31:53PM -0700, Timothy Hochberg wrote:
On Jan 4, 2008 3:28 PM, Scott Ransom sransom@nrao.edu wrote:
On Friday 04 January 2008 05:17:56 pm Stuart Brorson wrote:
I realize NumPy != Matlab, but I'd wager that most users would think that this is the natural behavior......
Well, that behavior won't happen. We won't mutate the dtype of the array because of assignment. Matlab has copy(-on-write) semantics for things like slices while we have view semantics. We can't safely do the reallocation of memory [1].
That's fair enough. But then I think NumPy should consistently typecheck all assignmetns and throw an exception if the user attempts an assignment which looses information.
If you point me to a file where assignments are done (particularly from array elements to array elements) I can see if I can figure out how to fix it & then submit a patch. But I won't promise anything! My brain hurts already after analyzing this "feature"..... :-)
There is a long history in numeric/numarray/numpy about this "feature". And for many of us, it really is a feature -- it prevents the automatic upcasting of arrays, which is very important if your arrays are huge (i.e. comparable in size to your system memory).
For instance in astronomy, where very large 16-bit integer or 32-bit float images or data-cubes are common, if you upcast your 32-bit floats accidentally because you are doing double precision math (i.e. the default in Python) near them, that can cause the program to swap out or die horribly. In fact, this exact example is one of the reasons why the Space Telescope people initially developed numarray. numpy has kept that model. I agree, though, that when using very mixed types (i.e. complex and ints, for example), the results can be confusing.
This isn't a very compelling argument in this case. The concern the numarray people were addressing was the upcasting of precision. However, there are two related hierarchies in numpy, one is the kind[1] of data, roughly: bool, int, float, complex. Each kind has various precisions. The numarray folks were concerned with avoiding upcasting of precision, not with avoiding upcasting up kinds. And, I can't see much (any?) justification for allowing automagic downcasting of kind, complex->float being the most egregious, other than backwards compatibility. This is clearly an opportunity for confusion and likely a magnet for bug. And, I've yet to see any useful examples to support this behaviour. I imagine that their are some benifits, but I doubt that they are compelling enough to justify the current behaviour.
I wasn't at all arguing that having complex data chopped and downcast into an int or float container was the right thing to do. What I was trying to address was that preventing automatic upcasting of the "kind" of data that you have is often very useful and in fact was one of the driving reasons behind numarray.
For this particular complex-type example I think that an exception would be the proper thing, since if you really want to throw away the imaginary part it is easy enough to specify x.real.
But the float->int situation is different. I can see good reasons for both upcast and downcast. This is one of those situations where the programmer had better be sure that they know what they are doing (and being explicit in the code would be better than implicit).
Scott

Scott Ransom wrote:
I wasn't at all arguing that having complex data chopped and downcast into an int or float container was the right thing to do.
indeed, it is an clearly bad thing to do -- but a bug magnet? I'm not so sure, surely, anyone that had any idea at all what they were doing with complex numbers would notice it right away.
To the OP: Did you waste any time thinking this was working right? Or was your time spent figuring out why numpy wold do such a thing? which is wasted time none the less.
-Chris

I wasn't at all arguing that having complex data chopped and downcast into an int or float container was the right thing to do.
indeed, it is an clearly bad thing to do -- but a bug magnet? I'm not so sure, surely, anyone that had any idea at all what they were doing with complex numbers would notice it right away.
To the OP: Did you waste any time thinking this was working right? Or was your time spent figuring out why numpy wold do such a thing? which is wasted time none the less.
Thanks for the question. I spent about 1/2 hour looking at my other code trying to figure out why I was getting strange results. Of course, I suspected my own code to be the culpret, since NumPy is a mature package, right?.
Then, when I looked closely at the return array given by NumPy, I noticed that it was real, when I was working with complex numbers. I said to myself "WTF?". Then a little more investigating revealed that NumPy was silently truncating my complex array to real.
I respectfully submit that silently chopping the imaginary part *is* a magnet for bugs, since many dumb NumPy users (like me) aren't necessarily aware of the typecasting rules. We're only thinking about doing math, after all!
Stuart Brorson Interactive Supercomputing, inc. 135 Beaver Street | Waltham | MA | 02452 | USA http://www.interactivesupercomputing.com/

I realize NumPy != Matlab, but I'd wager that most users would think that this is the natural behavior......
Well, that behavior won't happen. We won't mutate the dtype of the array because of assignment. Matlab has copy(-on-write) semantics for things like slices while we have view semantics. We can't safely do the reallocation of memory [1].
That's fair enough. But then I think NumPy should consistently typecheck all assignmetns and throw an exception if the user attempts an assignment which looses information.
There is a long history in numeric/numarray/numpy about this "feature". And for many of us, it really is a feature -- it prevents the automatic upcasting of arrays, which is very important if your arrays are huge (i.e. comparable in size to your system memory).
That's well and good. But NumPy should *never* automatically -- and silently -- chop the imaginary part off your complex array elements, particularly if you are just doing an innocent assignment! Doing something drastic like silently throwing half your data away can lead to all kinds of bugs in code written by somebody who is unaware of this behavior (i.e. most people)!
It sounds to me like the right thing is to throw an exception instead of "downcasting" a data object.
Stuart

Hello all,
That's well and good. But NumPy should *never* automatically -- and silently -- chop the imaginary part off your complex array elements, particularly if you are just doing an innocent assignment! Doing something drastic like silently throwing half your data away can lead to all kinds of bugs in code written by somebody who is unaware of this behavior (i.e. most people)!
It sounds to me like the right thing is to throw an exception instead of "downcasting" a data object.
I'm not sure that I agree! I'd rather not have to litter my code with "casting" operations every time I wanted to down-convert data types (creating yet more temporary arrays...) via assignment. e.g.:
A[i] = calculate(B).astype(A.dtype) vs. A[i] = calculate(B)
Further, writing functions to operate on generic user-provided output arrays (or arrays of user-provided dtype; both of which are common e.g. in scipy.ndimage) becomes more bug-prone, as every assignment would need to be modified as above.
This change would break a lot of the image-processing code I've written (where one often does computations in float or even double, and then re-scales and down-converts the result to integer for display), for example.
I guess that this could be rolled in via the geterr/seterr mechanism, and could by default just print warnings. I agree that silent truncation can be troublesome, but not having to spell every conversion out in explicit ( and error-prone) detail is also pretty useful. (And arguably more pythonic.)
Zach

Zachary Pincus wrote:
Hello all,
That's well and good. But NumPy should *never* automatically -- and silently -- chop the imaginary part off your complex array elements, particularly if you are just doing an innocent assignment! Doing something drastic like silently throwing half your data away can lead to all kinds of bugs in code written by somebody who is unaware of this behavior (i.e. most people)!
It sounds to me like the right thing is to throw an exception instead of "downcasting" a data object.
I'm not sure that I agree! I'd rather not have to litter my code with "casting" operations every time I wanted to down-convert data types (creating yet more temporary arrays...) via assignment. e.g.:
A[i] = calculate(B).astype(A.dtype) vs. A[i] = calculate(B)
Further, writing functions to operate on generic user-provided output arrays (or arrays of user-provided dtype; both of which are common e.g. in scipy.ndimage) becomes more bug-prone, as every assignment would need to be modified as above.
This change would break a lot of the image-processing code I've written (where one often does computations in float or even double, and then re-scales and down-converts the result to integer for display), for example.
I guess that this could be rolled in via the geterr/seterr mechanism, and could by default just print warnings. I agree that silent truncation can be troublesome, but not having to spell every conversion out in explicit ( and error-prone) detail is also pretty useful. (And arguably more pythonic.)
There's some confusion in the conversation here. Tim already identified it, but since it's continuing, I'm going to reiterate.
There are two related hierarchies of datatypes: different kinds (integer, floating point, complex floating point) and different precisions within a given kind (int8, int16, int32, int64). The term "downcasting" should probably be reserved for the latter only.
It seems to me that Zach and Scott are defending downcasting of precisions within a given kind. It does not necessarily follow that the behavior we choose for dtypes within a given kind should be the behavior when we are dealing with dtypes across different kinds. We can keep the precision downcasting behavior that you want while raising an error when one attempts to assign a complex number into a floating point array.

There are two related hierarchies of datatypes: different kinds (integer, floating point, complex floating point) and different precisions within a given kind (int8, int16, int32, int64). The term "downcasting" should probably be reserved for the latter only.
It seems to me that Zach and Scott are defending downcasting of precisions within a given kind. It does not necessarily follow that the behavior we choose for dtypes within a given kind should be the behavior when we are dealing with dtypes across different kinds. We can keep the precision downcasting behavior that you want while raising an error when one attempts to assign a complex number into a floating point array.
Actually, my points were explicitly about cross-kind conversions! A lot of image-processing code features heavy use of converting integer images to float for intermediate calculations, and then rescaling and down-converting back to the original type for display, etc.
In fact, scipy.ndimage makes use of this, allowing for operations with output into user-specified arrays, or arrays of user-specified dtype, while (I believe) carrying out some of the intermediate calculations at higher precision.
As such, pretty much any code that takes a user-specified array and assigns to it the result of a (potentially) mixed-mode calculation would need to be changed from:
A[i] = calculate(B) to A[i] = calculate(B).astype(A.dtype)
with the proliferation of temp arrays that that entails.
I think, but am not sure, that there is a lot of code out there that does this, intentionally, which would be broken by this change. Explicit is indeed better than implicit, but in this case, finding all of the places where mixed-mode conversions happen and tracking them down could be a pain on the same scale as const chasing in C++, where fixing the error in one place makes it reappear elsewhere in the chain of usage, leading to long, painful, and often totally pointless debugging sessions.
In the specific case of converting from complex to anything else, I can see the desire for a warning to prevent data loss. But converting from float to integer is a lot more routine and is used a lot in the kind of work that I do, at least.
Zach

Stuart Brorson wrote:
I realize NumPy != Matlab, but I'd wager that most users would think that this is the natural behavior......
Well, that behavior won't happen. We won't mutate the dtype of the array because of assignment. Matlab has copy(-on-write) semantics for things like slices while we have view semantics. We can't safely do the reallocation of memory [1].
That's fair enough. But then I think NumPy should consistently typecheck all assignmetns and throw an exception if the user attempts an assignment which looses information.
Yeah, there's no doubt in my mind that this is a bug, if for no other reason than this inconsistency:
import numpy as N a = N.zeros((2,2)) a[0,0]=3+4j
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can't convert complex to float; use abs(z)
b=N.array([3+4j]) a[0,0]=b[0] a
array([[ 3., 0.], [ 0., 0.]])
I've been bitten by this myself and would have very much appreciated an exception rather than incorrect answers. It would seem that this behavior violates "In the face of ambiguity, refuse the temptation to guess."
The question to me is how much do people rely on this "feature" of the API and does that dictate putting the change off until Numpy 1.1 or beyond?
Ryan

On Jan 7, 2008 8:47 AM, Ryan May rmay@ou.edu wrote:
Stuart Brorson wrote:
I realize NumPy != Matlab, but I'd wager that most users would think that this is the natural behavior......
Well, that behavior won't happen. We won't mutate the dtype of the
array because
of assignment. Matlab has copy(-on-write) semantics for things like
slices while
we have view semantics. We can't safely do the reallocation of memory
[1].
That's fair enough. But then I think NumPy should consistently typecheck all assignmetns and throw an exception if the user attempts an assignment which looses information.
Yeah, there's no doubt in my mind that this is a bug, if for no other reason than this inconsistency:
One place where Numpy differs from MatLab is the way memory is handled. MatLab is always generating new arrays, so for efficiency it is worth preallocating arrays and then filling in the parts. This is not the case in Numpy where lists can be used for things that grow and subarrays are views. Consequently, preallocating arrays in Numpy should be rare and used when either the values have to be generated explicitly, which is what you see when using the indexes in your first example. As to assignment between arrays, it is a mixed question. The problem again is memory usage. For large arrays, it makes since to do automatic conversions, as is also the case in functions taking output arrays, because the typecast can be pushed down into C where it is time and space efficient, whereas explicitly converting the array uses up temporary space. However, I can imagine an explicit typecast function, something like
a[...] = typecast(b)
that would replace the current behavior. I think the typecast function could be implemented by returning a view of b with a castable flag set to true, that should supply enough information for the assignment operator to do its job. This might be a good addition for Numpy 1.1.
Chuck

Charles R Harris wrote:
On Jan 7, 2008 8:47 AM, Ryan May <rmay@ou.edu mailto:rmay@ou.edu> wrote:
Stuart Brorson wrote: >>> I realize NumPy != Matlab, but I'd wager that most users would think >>> that this is the natural behavior...... >> Well, that behavior won't happen. We won't mutate the dtype of the array because >> of assignment. Matlab has copy(-on-write) semantics for things like slices while >> we have view semantics. We can't safely do the reallocation of memory [1]. > > That's fair enough. But then I think NumPy should consistently > typecheck all assignmetns and throw an exception if the user attempts > an assignment which looses information. > Yeah, there's no doubt in my mind that this is a bug, if for no other reason than this inconsistency:
One place where Numpy differs from MatLab is the way memory is handled. MatLab is always generating new arrays, so for efficiency it is worth preallocating arrays and then filling in the parts. This is not the case in Numpy where lists can be used for things that grow and subarrays are views. Consequently, preallocating arrays in Numpy should be rare and used when either the values have to be generated explicitly, which is what you see when using the indexes in your first example. As to assignment between arrays, it is a mixed question. The problem again is memory usage. For large arrays, it makes since to do automatic conversions, as is also the case in functions taking output arrays, because the typecast can be pushed down into C where it is time and space efficient, whereas explicitly converting the array uses up temporary space. However, I can imagine an explicit typecast function, something like
a[...] = typecast(b)
that would replace the current behavior. I think the typecast function could be implemented by returning a view of b with a castable flag set to true, that should supply enough information for the assignment operator to do its job. This might be a good addition for Numpy 1.1.
While that seems like an ok idea, I'm still not sure what's wrong with raising an exception when there will be information loss. The exception is already raised with standard python complex objects. I can think of many times in my code where explicit looping is a necessity, so pre-allocating the array is the only way to go.

For large arrays, it makes since to do automatic conversions, as is also the case in functions taking output arrays, because the typecast can be pushed down into C where it is time and space efficient, whereas explicitly converting the array uses up temporary space. However, I can imagine an explicit typecast function, something like
a[...] = typecast(b)
that would replace the current behavior. I think the typecast function could be implemented by returning a view of b with a castable flag set to true, that should supply enough information for the assignment operator to do its job. This might be a good addition for Numpy 1.1.
While that seems like an ok idea, I'm still not sure what's wrong with raising an exception when there will be information loss. The exception is already raised with standard python complex objects. I can think of many times in my code where explicit looping is a necessity, so pre-allocating the array is the only way to go.
The issue Charles is dealing with here is how to *suppress* the proposed exception in cases (as the several that I described) where the information loss is explicitly desired.
With what's currently in numpy now, you would have to do something like this: A[...] = B.astype(A.dtype) to set a portion of A to B, unless you are *certain* that A and B are of compatible types.
This is ugly and also bug-prone, seeing as how there's some violation of the don't-repeat-yourself principle. (I.e. A is mentioned twice, and to change the code to use a different array, you need to change the variable name A twice.)
Moreover, and worse, the phrase 'A = B.astype(A.dtype)' creates and destroys a temporary array the same size as B. It's equivalent to: temp = B.astype(A.dtype) A[...] = temp
Which is no good if B is very large. Currently, the type conversion in 'A[...] = B' cases is handled implicitly, deep down in the C code where it is very very fast, and no temporary array is made.
Charles suggests a 'typecast' operator that would set a flag on the array B so that trying to convert it would *not* raise an exception, allowing for the fast, C-level conversion. (This assumes your proposed change in which by default such conversions would raise exceptions.) This 'typecast' operation wouldn't do anything but set a flag, so it doesn't create a temporary array or do any extra work.
But still, every time that you are not *certain* what the type of a result from a given operation is, any code that looks like: A[i] = calculate(...) will need to look like this instead: A[i] = typecast(calculate(...))
I agree with others that such assignments aren't highly common, but there will be broken code from this. And as Charles demonstrates, getting the details right of how to implement such a change is non- trivial.
Zach

Zachary Pincus wrote:
For large arrays, it makes since to do automatic conversions, as is also the case in functions taking output arrays, because the typecast can be pushed down into C where it is time and space efficient, whereas explicitly converting the array uses up temporary space. However, I can imagine an explicit typecast function, something like
a[...] = typecast(b)
that would replace the current behavior. I think the typecast function could be implemented by returning a view of b with a castable flag set to true, that should supply enough information for the assignment operator to do its job. This might be a good addition for Numpy 1.1.
While that seems like an ok idea, I'm still not sure what's wrong with raising an exception when there will be information loss. The exception is already raised with standard python complex objects. I can think of many times in my code where explicit looping is a necessity, so pre-allocating the array is the only way to go.
The issue Charles is dealing with here is how to *suppress* the proposed exception in cases (as the several that I described) where the information loss is explicitly desired.
With what's currently in numpy now, you would have to do something like this: A[...] = B.astype(A.dtype) to set a portion of A to B, unless you are *certain* that A and B are of compatible types.
This is ugly and also bug-prone, seeing as how there's some violation of the don't-repeat-yourself principle. (I.e. A is mentioned twice, and to change the code to use a different array, you need to change the variable name A twice.)
Moreover, and worse, the phrase 'A = B.astype(A.dtype)' creates and destroys a temporary array the same size as B. It's equivalent to: temp = B.astype(A.dtype) A[...] = temp
Which is no good if B is very large. Currently, the type conversion in 'A[...] = B' cases is handled implicitly, deep down in the C code where it is very very fast, and no temporary array is made.
Charles suggests a 'typecast' operator that would set a flag on the array B so that trying to convert it would *not* raise an exception, allowing for the fast, C-level conversion. (This assumes your proposed change in which by default such conversions would raise exceptions.) This 'typecast' operation wouldn't do anything but set a flag, so it doesn't create a temporary array or do any extra work.
But still, every time that you are not *certain* what the type of a result from a given operation is, any code that looks like: A[i] = calculate(...) will need to look like this instead: A[i] = typecast(calculate(...))
I agree with others that such assignments aren't highly common, but there will be broken code from this. And as Charles demonstrates, getting the details right of how to implement such a change is non- trivial.
I agree that some of the other options with typecast/astype look horrible, as well as that unnecessary temporaries are bad. Maybe I'm too focused on just the complex case, where all you really need is to use .real to get only the real part (and coincidentally works with float and int arrays). The nice part about using .real is that it explicitly states what you're looking for. I'm not personally interested in anything other than this silent complex->float conversion.
Ryan

On 07/01/2008, Charles R Harris charlesr.harris@gmail.com wrote:
One place where Numpy differs from MatLab is the way memory is handled. MatLab is always generating new arrays, so for efficiency it is worth preallocating arrays and then filling in the parts. This is not the case in Numpy where lists can be used for things that grow and subarrays are views. Consequently, preallocating arrays in Numpy should be rare and used when either the values have to be generated explicitly, which is what you see when using the indexes in your first example. As to assignment between arrays, it is a mixed question. The problem again is memory usage. For large arrays, it makes since to do automatic conversions, as is also the case in functions taking output arrays, because the typecast can be pushed down into C where it is time and space efficient, whereas explicitly converting the array uses up temporary space. However, I can imagine an explicit typecast function, something like
a[...] = typecast(b)
that would replace the current behavior. I think the typecast function could be implemented by returning a view of b with a castable flag set to true, that should supply enough information for the assignment operator to do its job. This might be a good addition for Numpy 1.1.
This is introducing a fairly complex mechanism to cover, as far as I can see, two cases: conversion of complex to real, and conversion of float to integer. Conversion of complex to real can already be done explicitly without a temporary:
a[...] = b.real
That leaves only conversion of float to integer.
Does this single case cause enough confusion to warrant an exception and a way around it?
Anne

On Jan 7, 2008 12:08 PM, Anne Archibald peridot.faceted@gmail.com wrote:
On 07/01/2008, Charles R Harris charlesr.harris@gmail.com wrote:
One place where Numpy differs from MatLab is the way memory is handled. MatLab is always generating new arrays, so for efficiency it is worth preallocating arrays and then filling in the parts. This is not the case
in
Numpy where lists can be used for things that grow and subarrays are
views.
Consequently, preallocating arrays in Numpy should be rare and used when either the values have to be generated explicitly, which is what you see when using the indexes in your first example. As to assignment between arrays, it is a mixed question. The problem again is memory usage. For
large
arrays, it makes since to do automatic conversions, as is also the case
in
functions taking output arrays, because the typecast can be pushed down
into
C where it is time and space efficient, whereas explicitly converting
the
array uses up temporary space. However, I can imagine an explicit
typecast
function, something like
a[...] = typecast(b)
that would replace the current behavior. I think the typecast function
could
be implemented by returning a view of b with a castable flag set to
true,
that should supply enough information for the assignment operator to do
its
job. This might be a good addition for Numpy 1.1.
This is introducing a fairly complex mechanism to cover, as far as I can see, two cases: conversion of complex to real, and conversion of float to integer. Conversion of complex to real can already be done explicitly without a temporary:
a[...] = b.real
That leaves only conversion of float to integer.
Does this single case cause enough confusion to warrant an exception and a way around it?
I think it could be something like C++ where implicit downcasts in general are flagged. If we went that way, conversions like uint32 to int32 would also be flagged. The question is how much care we want to take with types. Since Numpy is much more type oriented than, say, MatLab, such type safety might be a useful addition. Python scalar types would remain castable, so you wouldn't have to typecast every darn thing.
Chuck

On Jan 4, 2008 2:37 PM, Stuart Brorson sdb@cloud9.net wrote:
On Fri, 4 Jan 2008, Stuart Brorson wrote:
<snip>
I realize NumPy != Matlab, but I'd wager that most users would think that this is the natural behavior......
Matlab support for different types was sort of kludged on. Matlab was intended for computational convenience, not control of data types, and started out as pretty much all doubles and double complex, it didn't (doesn't?) even have integers. NumPy is a bit closer to the metal and support for different data types has been there from the start, but that extra power means you have to devote a bit more thought to what you are doing.
Chuck

On Fri, Jan 04, 2008 at 03:53:41PM -0700, Charles R Harris wrote:
Matlab support for different types was sort of kludged on. Matlab was intended for computational convenience, not control of data types, and started out as pretty much all doubles and double complex, it didn't (doesn't?) even have integers.
In has integers arrays, and does not upcast to double arrays. My experience is that you are better off avoiding these (unless you talk to hardware, which is why I use them) because many things won't work with them.
Cheers,
Gaël
participants (12)
-
Alan G Isaac
-
Anne Archibald
-
Charles R Harris
-
Chris Barker
-
Gael Varoquaux
-
Pierre GM
-
Robert Kern
-
Ryan May
-
Scott Ransom
-
Stuart Brorson
-
Timothy Hochberg
-
Zachary Pincus