Casting Bug or a "Feature"?
Greetings, I spent a couple hours today tracking down a bug in one of my programs. I was getting different answers depending on whether I passed in a numpy array or a single number. Ultimately, I tracked it down to something I would consider a bug, but I'm not sure if others do. The case comes from taking a numpy integer array and adding a float to it. When doing var = np.array(ints) + float, var is cast to an array of floats, which is what I would expect. However, if I do np.array(ints) += float, the result is an array of integers. I can understand why this happens -- you are shoving the sum back into an integer array -- but without thinking through that I would expect the behavior of the two additions to be equal...or at least be consistent with what occurs with numbers, instead of arrays. Here's a trivial example demonstrating this import numpy as np a = np.arange(10) print a.dtype b = a + 0.5 print b.dtype a += 0.5 print a.dtype
int64 float64 int64 <type 'int'> <type 'float'> <type 'float'>
An implication of this arrises from a simple function that "does math". The function returns different values depending on whether a number or array was passed in. def add_n_multiply(var): var += 0.5 var *= 10 return var aaa = np.arange(5) print aaa print add_n_multiply(aaa.copy()) print [add_n_multiply(x) for x in aaa.copy()]
[0 1 2 3 4] [ 0 10 20 30 40] [5.0, 15.0, 25.0, 35.0, 45.0]
Am I alone in thinking this is a bug? Or is this the behavior that others would have expected? Cheers, Patrick --- Patrick Marsh Ph.D. Candidate / Liaison to the HWT School of Meteorology / University of Oklahoma Cooperative Institute for Mesoscale Meteorological Studies National Severe Storms Laboratory http://www.patricktmarsh.com
Hi Patrick: I think it is the behavior I have come to expect. The only "gotcha" here might be the difference between "var = var + 0.5" and "var += 0.5" For example:
import numpy as np
x = np.arange(5); x += 0.5; x array([0, 1, 2, 3, 4])
x = np.arange(5); x = x + 0.5; x array([ 0.5, 1.5, 2.5, 3.5, 4.5])
The first line is definitely what I expect. The second, the automatic casting from int64 -> double, is documented and generally desirable. It's hard to avoid these casting issues without making code unnecessarily complex or allowing only one data type (e.g., as MATLAB does). If you worry about standardizing behavior you can always use `var = np.array(var, dtype=np.double, copy=True)` or similar at the start of your function. -Brad On Wed, Jan 16, 2013 at 4:16 PM, Patrick Marsh <patrickmarshwx@gmail.com>wrote:
Greetings,
I spent a couple hours today tracking down a bug in one of my programs. I was getting different answers depending on whether I passed in a numpy array or a single number. Ultimately, I tracked it down to something I would consider a bug, but I'm not sure if others do. The case comes from taking a numpy integer array and adding a float to it. When doing var = np.array(ints) + float, var is cast to an array of floats, which is what I would expect. However, if I do np.array(ints) += float, the result is an array of integers. I can understand why this happens -- you are shoving the sum back into an integer array -- but without thinking through that I would expect the behavior of the two additions to be equal...or at least be consistent with what occurs with numbers, instead of arrays. Here's a trivial example demonstrating this
import numpy as np a = np.arange(10) print a.dtype b = a + 0.5 print b.dtype a += 0.5 print a.dtype
int64 float64 int64 <type 'int'> <type 'float'> <type 'float'>
An implication of this arrises from a simple function that "does math". The function returns different values depending on whether a number or array was passed in.
def add_n_multiply(var): var += 0.5 var *= 10 return var
aaa = np.arange(5) print aaa print add_n_multiply(aaa.copy()) print [add_n_multiply(x) for x in aaa.copy()]
[0 1 2 3 4] [ 0 10 20 30 40] [5.0, 15.0, 25.0, 35.0, 45.0]
Am I alone in thinking this is a bug? Or is this the behavior that others would have expected?
Cheers, Patrick --- Patrick Marsh Ph.D. Candidate / Liaison to the HWT School of Meteorology / University of Oklahoma Cooperative Institute for Mesoscale Meteorological Studies National Severe Storms Laboratory http://www.patricktmarsh.com
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Patrick, Not a bug but is it a mis-feature? See the recent thread: "Do we want scalar casting to behave as it does at the moment" In short, this is an complex issue with no easy answer... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
This is separate from the scalar casting thing. This is a disguised version of the discussion about what we should do with implicit casts caused by assignment: into_array[i] = 0.5 Traditionally numpy just happily casts this stuff, possibly mangling data in the process, and this has caused many actual bugs in user code. In 1.6 some of these assignments cause errors, but we reverted this in 1.7 because this was also breaking things. Supposedly we also deprecated these at the same time, with an eye towards making them errors eventually, but I'm not sure we did this properly, and our carrying rules need revisiting in any case. (Sorry for lack of links to earlier discussion; traveling and on my phone.) -n On 16 Jan 2013 16:42, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote:
Patrick,
Not a bug but is it a mis-feature?
See the recent thread: "Do we want scalar casting to behave as it does at the moment"
In short, this is an complex issue with no easy answer...
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Thanks, everyone for chiming in. Now that I know this behavior exists, I can explicitly prevent it in my code. However, it would be nice if a warning or something was generated to alert users about the inconsistency between var += ... and var = var + ... Patrick --- Patrick Marsh Ph.D. Candidate / Liaison to the HWT School of Meteorology / University of Oklahoma Cooperative Institute for Mesoscale Meteorological Studies National Severe Storms Laboratory http://www.patricktmarsh.com On Wed, Jan 16, 2013 at 7:24 PM, Nathaniel Smith <njs@pobox.com> wrote:
This is separate from the scalar casting thing. This is a disguised version of the discussion about what we should do with implicit casts caused by assignment: into_array[i] = 0.5
Traditionally numpy just happily casts this stuff, possibly mangling data in the process, and this has caused many actual bugs in user code. In 1.6 some of these assignments cause errors, but we reverted this in 1.7 because this was also breaking things. Supposedly we also deprecated these at the same time, with an eye towards making them errors eventually, but I'm not sure we did this properly, and our carrying rules need revisiting in any case.
(Sorry for lack of links to earlier discussion; traveling and on my phone.)
-n On 16 Jan 2013 16:42, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote:
Patrick,
Not a bug but is it a mis-feature?
See the recent thread: "Do we want scalar casting to behave as it does at the moment"
In short, this is an complex issue with no easy answer...
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Jan 16, 2013 at 10:43 PM, Patrick Marsh <patrickmarshwx@gmail.com> wrote:
Thanks, everyone for chiming in. Now that I know this behavior exists, I can explicitly prevent it in my code. However, it would be nice if a warning or something was generated to alert users about the inconsistency between var += ... and var = var + ...
Since I also got bitten by this recently in my code, I fully agree. I could live with an exception for lossy down casting in this case. Josef
Patrick
--- Patrick Marsh Ph.D. Candidate / Liaison to the HWT School of Meteorology / University of Oklahoma Cooperative Institute for Mesoscale Meteorological Studies National Severe Storms Laboratory http://www.patricktmarsh.com
On Wed, Jan 16, 2013 at 7:24 PM, Nathaniel Smith <njs@pobox.com> wrote:
This is separate from the scalar casting thing. This is a disguised version of the discussion about what we should do with implicit casts caused by assignment: into_array[i] = 0.5
Traditionally numpy just happily casts this stuff, possibly mangling data in the process, and this has caused many actual bugs in user code. In 1.6 some of these assignments cause errors, but we reverted this in 1.7 because this was also breaking things. Supposedly we also deprecated these at the same time, with an eye towards making them errors eventually, but I'm not sure we did this properly, and our carrying rules need revisiting in any case.
(Sorry for lack of links to earlier discussion; traveling and on my phone.)
-n
On 16 Jan 2013 16:42, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote:
Patrick,
Not a bug but is it a mis-feature?
See the recent thread: "Do we want scalar casting to behave as it does at the moment"
In short, this is an complex issue with no easy answer...
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
2013/1/16 <josef.pktd@gmail.com>:
On Wed, Jan 16, 2013 at 10:43 PM, Patrick Marsh <patrickmarshwx@gmail.com> wrote:
Thanks, everyone for chiming in. Now that I know this behavior exists, I can explicitly prevent it in my code. However, it would be nice if a warning or something was generated to alert users about the inconsistency between var += ... and var = var + ...
Since I also got bitten by this recently in my code, I fully agree. I could live with an exception for lossy down casting in this case.
About exceptions: someone mentioned in another thread about casting how having exceptions can make it difficult to write code. I've thought a bit more about this issue and I tend to agree, especially on code that used to "work" (in the sense of doing something -- not necessarily what you'd want -- without complaining). Don't get me wrong, when I write code I love when a library crashes and forces me to be more explicit about what I want, thus saving me the trouble of hunting down a tricky overflow / casting bug. However, in a production environment for instance, such an unexpected crash could have much worse consequences than an incorrect output. And although you may blame the programmer for not being careful enough about types, he couldn't expect it might crash the application back when this code was written.... Long story short, +1 for warning, -1 for exception, and +1 for a config flag that allows one to change to exceptions by default, if desired. -=- Olivier
On Thu, Jan 17, 2013 at 5:19 PM, Olivier Delalleau <shish@keba.be> wrote:
2013/1/16 <josef.pktd@gmail.com>:
On Wed, Jan 16, 2013 at 10:43 PM, Patrick Marsh <patrickmarshwx@gmail.com> wrote:
I could live with an exception for lossy down casting in this case.
I'm not sure what the idea here is -- would you only get an exception if the value was such that the downcast would be lossy? If so, a major -1 The other option would be to always raise an exception if types would cause a downcast, i.e: arr = np.zeros(shape, dtype-uint8) arr2 = arr + 30 # this would raise an exception arr2 = arr + np.uint8(30) # you'd have to do this That sure would be clear and result if few errors of this type, but sure seems verbose and "static language like" to me.
Long story short, +1 for warning, -1 for exception, and +1 for a config flag that allows one to change to exceptions by default, if desired.
is this for value-dependent or any casting of this sort? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Le vendredi 18 janvier 2013, Chris Barker - NOAA Federal a écrit :
On Thu, Jan 17, 2013 at 5:19 PM, Olivier Delalleau <shish@keba.be<javascript:;>> wrote:
2013/1/16 <josef.pktd@gmail.com <javascript:;>>:
On Wed, Jan 16, 2013 at 10:43 PM, Patrick Marsh <patrickmarshwx@gmail.com <javascript:;>> wrote:
I could live with an exception for lossy down casting in this case.
I'm not sure what the idea here is -- would you only get an exception if the value was such that the downcast would be lossy? If so, a major -1
The other option would be to always raise an exception if types would cause a downcast, i.e:
arr = np.zeros(shape, dtype-uint8)
arr2 = arr + 30 # this would raise an exception
arr2 = arr + np.uint8(30) # you'd have to do this
That sure would be clear and result if few errors of this type, but sure seems verbose and "static language like" to me.
Long story short, +1 for warning, -1 for exception, and +1 for a config flag that allows one to change to exceptions by default, if desired.
is this for value-dependent or any casting of this sort?
What I had in mind here is the situation where the scalar's dtype is fundamentally different from the array's dtype (i.e. float vs int, complex vs float) and can't be cast exactly into the array's dtype (so, value-dependent), which is the situation that originated this thread. I don't mind removing the second part ("and can't be cast exactly...") to have it value-independent. Other tricky situations with integer arrays are to some extent related to how regular (not in-place) additions are handled, something that should probably be settled first. -=- Olivier
On 17.01.2013 04:43, Patrick Marsh wrote:
Thanks, everyone for chiming in. Now that I know this behavior exists, I can explicitly prevent it in my code. However, it would be nice if a warning or something was generated to alert users about the inconsistency between var += ... and var = var + ...
Patrick
I agree wholeheartedly. I actually, for a long time, used to believe that python would translate a += b to a = a + b and was bitten several times by this bug. A warning (which can be silenced if you desperately want to) would be really nice, imho. Keep up the good work, Paul
Hi, Actually, this behavior is already present in other languages, so I'm -1 on additional verbosity. Of course a += b is not the same as a = a + b. The first one modifies the object a, the second one creates a new object and puts it inside a. The behavior IS consistent. Cheers, Matthieu 2013/1/17 Paul Anton Letnes <paul.anton.letnes@gmail.com>
On 17.01.2013 04:43, Patrick Marsh wrote:
Thanks, everyone for chiming in. Now that I know this behavior exists, I can explicitly prevent it in my code. However, it would be nice if a warning or something was generated to alert users about the inconsistency between var += ... and var = var + ...
Patrick
I agree wholeheartedly. I actually, for a long time, used to believe that python would translate a += b to a = a + b and was bitten several times by this bug. A warning (which can be silenced if you desperately want to) would be really nice, imho.
Keep up the good work, Paul _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/
On Thu, Jan 17, 2013 at 2:34 AM, Matthieu Brucher <matthieu.brucher@gmail.com> wrote:
Hi,
Actually, this behavior is already present in other languages, so I'm -1 on additional verbosity. Of course a += b is not the same as a = a + b. The first one modifies the object a, the second one creates a new object and puts it inside a. The behavior IS consistent.
The inplace operation is standard, but my guess is that the silent downcasting is not. in python
a = 1 a += 5.3 a 6.2999999999999998 a = 1 a *= 1j a 1j
I have no idea about other languages. Josef
Cheers,
Matthieu
2013/1/17 Paul Anton Letnes <paul.anton.letnes@gmail.com>
On 17.01.2013 04:43, Patrick Marsh wrote:
Thanks, everyone for chiming in. Now that I know this behavior exists, I can explicitly prevent it in my code. However, it would be nice if a warning or something was generated to alert users about the inconsistency between var += ... and var = var + ...
Patrick
I agree wholeheartedly. I actually, for a long time, used to believe that python would translate a += b to a = a + b and was bitten several times by this bug. A warning (which can be silenced if you desperately want to) would be really nice, imho.
Keep up the good work, Paul _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 01/17/2013 01:27 PM, josef.pktd@gmail.com wrote:
On Thu, Jan 17, 2013 at 2:34 AM, Matthieu Brucher <matthieu.brucher@gmail.com> wrote:
Hi,
Actually, this behavior is already present in other languages, so I'm -1 on additional verbosity. Of course a += b is not the same as a = a + b. The first one modifies the object a, the second one creates a new object and puts it inside a. The behavior IS consistent.
The inplace operation is standard, but my guess is that the silent downcasting is not.
in python
a = 1 a += 5.3 a 6.2999999999999998 a = 1 a *= 1j a 1j
I have no idea about other languages.
I don't think the comparison with Python scalars is relevant since they are immutable: In [9]: a = 1 In [10]: b = a In [11]: a *= 1j In [12]: b Out[12]: 1 In-place operators exists for lists, but I don't know what the equivalent of a down-cast would be... In [3]: a = [0, 1] In [4]: b = a In [5]: a *= 2 In [6]: b Out[6]: [0, 1, 0, 1] Dag Sverre
On Wed, Jan 16, 2013 at 11:34 PM, Matthieu Brucher
Of course a += b is not the same as a = a + b. The first one modifies the object a, the second one creates a new object and puts it inside a. The behavior IS consistent.
Exactly -- if you ask me, the bug is that Python allows "in_place" operators for immutable objects -- they should be more than syntactic sugar. Of course, the temptation for += on regular numbers was just too much to resist. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Am 17.01.2013 17:21, schrieb Chris Barker - NOAA Federal:
On Wed, Jan 16, 2013 at 11:34 PM, Matthieu Brucher
Of course a += b is not the same as a = a + b. The first one modifies the object a, the second one creates a new object and puts it inside a. The behavior IS consistent.
Exactly -- if you ask me, the bug is that Python allows "in_place" operators for immutable objects -- they should be more than syntactic sugar.
They are not -- the "+=" translation is well defined: the equivalents are a += b a = a.__iadd__(b) Now __iadd__ can choose to return self (for mutable objects) or a new object (for immutable objects). The confusion about immutables is simply the "usual" confusion about "=" assigning names, not variable space.
Of course, the temptation for += on regular numbers was just too much to resist.
And probably 95% of the use of +=/-= *is* with regular numbers. Georg
participants (10)
-
Bradley M. Froehle -
Chris Barker - NOAA Federal -
Dag Sverre Seljebotn -
Georg Brandl -
josef.pktd@gmail.com -
Matthieu Brucher -
Nathaniel Smith -
Olivier Delalleau -
Patrick Marsh -
Paul Anton Letnes