
On Fri, Jan 4, 2013 at 12:11 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
On 01/04/2013 12:39 AM, Andrew Collette wrote:
Nathaniel Smith wrote:
Consensus in that bug report seems to be that for array/scalar operations like: np.array([1], dtype=np.int8) + 1000 # can't be represented as an int8! we should raise an error, rather than either silently upcasting the result (as in 1.6 and 1.7) or silently downcasting the scalar (as in 1.5 and earlier).
I have run into this a few times as a NumPy user, and I just wanted to comment that (in my opinion), having this case generate an error is the worst of both worlds. The reason people can't decide between rollover and promotion is because neither is objectively better. One
If neither is objectively better, I think that is a very good reason to kick it down to the user. "Explicit is better than implicit".
avoids memory inflation, and the other avoids losing precision. You just need to pick one and document it. Kicking the can down the road to the user, and making him/her explicitly test for this condition, is not a very good solution.
It's a good solution to encourage bug-free code. It may not be a good solution to avoid typing.
What does this mean in practical terms for NumPy users? I personally don't relish the choice of always using numpy.add, or always wrapping my additions in checks for ValueError.
I think you usually have a bug in your program when this happens, since either the dtype is wrong, or the value one is trying to store is wrong. I know that's true for myself, though I don't claim to know everybody elses usecases.
I agree with Dag rather than Andrew, "Explicit is better than implicit". i.e. What Nathaniel described earlier as the apparent consensus. Since I've actually used NumPy arrays with specific low memory types, I thought I should comment about my use case if case it is helpful: I've only used the low precision types like np.uint8 (unsigned) where I needed to limit my memory usage. In this case, the topology of a graph allowing multiple edges held as an integer adjacency matrix, A. I would calculate things like A^n for paths of length n, and also make changes to A directly (e.g. adding edges). So an overflow was always possible, and neither the old behaviour (type preserving but wrapping on overflow giving data corruption) nor the current behaviour (type promotion overriding my deliberate memory management) are nice. My preferences here would be for an exception, so I knew right away. The other use case which comes to mind is dealing with low level libraries and/or file formats, and here automagic type promotion would probably be unwelcome. Regards, Peter