Strange behavior in setting masked array values in Numpy 1.1.0

Great job getting numpy 1.1.0 out and thanks for including the old API of masked arrays. I've been playing around with some software using numpy 1.0.4 and took a crack at upgrading it to numpy 1.1.0, but I ran into some strange behavior when assigning to slices of a masked array. I made the simplest example I could think of to show this weird behavior. Basically, reordering the masked array and assigning back to itself *on the same line* seems to work for part of the array, but other parts are left unchanged. In the example below, half of the array is assigned "properly" and the other half isn't. This problem is eliminated if the assignment is done with a copy of the array. Alternatively, this problem is eliminated if I using numpy.oldnumeric.ma.masked_array instead of the new masked array implementation. Is this just a problem on my setup? Thanks in advance for your help. -Tony Yu Example: ======== In [1]: import numpy In [2]: masked = numpy.ma.masked_array([[1, 2, 3, 4, 5]], mask=False) In [3]: masked[:] = numpy.fliplr(masked.copy()) In [4]: print masked [[5 4 3 2 1]] In [5]: masked[:] = numpy.fliplr(masked) In [6]: print masked [[1 2 3 2 1]] Specs: ====== Numpy 1.1.0 Python 2.5.1 OS X Leopard 10.5.3

Hi, This is to be expected. You are trying to modify and read the same array at the same time, which should never be done. Matthieu 2008/5/31 Tony Yu <tsyu80@gmail.com>:
Great job getting numpy 1.1.0 out and thanks for including the old API of masked arrays.
I've been playing around with some software using numpy 1.0.4 and took a crack at upgrading it to numpy 1.1.0, but I ran into some strange behavior when assigning to slices of a masked array.
I made the simplest example I could think of to show this weird behavior. Basically, reordering the masked array and assigning back to itself *on the same line* seems to work for part of the array, but other parts are left unchanged. In the example below, half of the array is assigned "properly" and the other half isn't. This problem is eliminated if the assignment is done with a copy of the array. Alternatively, this problem is eliminated if I using numpy.oldnumeric.ma.masked_array instead of the new masked array implementation.
Is this just a problem on my setup?
Thanks in advance for your help. -Tony Yu
Example: ======== In [1]: import numpy
In [2]: masked = numpy.ma.masked_array([[1, 2, 3, 4, 5]], mask=False)
In [3]: masked[:] = numpy.fliplr(masked.copy())
In [4]: print masked [[5 4 3 2 1]]
In [5]: masked[:] = numpy.fliplr(masked)
In [6]: print masked [[1 2 3 2 1]]
Specs: ====== Numpy 1.1.0 Python 2.5.1 OS X Leopard 10.5.3
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher

On May 31, 2008, at 6:04 PM, Matthieu Brucher wrote:
Hi,
This is to be expected. You are trying to modify and read the same array at the same time, which should never be done.
Thanks, I'll have to keep this in mind next time. So, what's the best way to rearrange a subarray of an array. Copying seems inefficient. -Tony
Matthieu
2008/5/31 Tony Yu <tsyu80@gmail.com>: Great job getting numpy 1.1.0 out and thanks for including the old API of masked arrays.
I've been playing around with some software using numpy 1.0.4 and took a crack at upgrading it to numpy 1.1.0, but I ran into some strange behavior when assigning to slices of a masked array.
I made the simplest example I could think of to show this weird behavior. Basically, reordering the masked array and assigning back to itself *on the same line* seems to work for part of the array, but other parts are left unchanged. In the example below, half of the array is assigned "properly" and the other half isn't. This problem is eliminated if the assignment is done with a copy of the array. Alternatively, this problem is eliminated if I using numpy.oldnumeric.ma.masked_array instead of the new masked array implementation.
Is this just a problem on my setup?
Thanks in advance for your help. -Tony Yu
Example: ======== In [1]: import numpy
In [2]: masked = numpy.ma.masked_array([[1, 2, 3, 4, 5]], mask=False)
In [3]: masked[:] = numpy.fliplr(masked.copy())
In [4]: print masked [[5 4 3 2 1]]
In [5]: masked[:] = numpy.fliplr(masked)
In [6]: print masked [[1 2 3 2 1]]
Specs: ====== Numpy 1.1.0 Python 2.5.1 OS X Leopard 10.5.3
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/? blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

On Saturday 31 May 2008 18:35:32 Tony Yu wrote:
On May 31, 2008, at 6:04 PM, Matthieu Brucher wrote:
Hi,
This is to be expected. You are trying to modify and read the same array at the same time, which should never be done.
Thanks, I'll have to keep this in mind next time.
And the reason why it works with oldnumeric is that the right term is copied while setting the slice.
So, what's the best way to rearrange a subarray of an array. Copying seems inefficient.
But likely the only solution for your data to be contiguous

la, 2008-05-31 kello 17:56 -0400, Tony Yu kirjoitti: [clip]
I've been playing around with some software using numpy 1.0.4 and took a crack at upgrading it to numpy 1.1.0, but I ran into some strange behavior when assigning to slices of a masked array. [clip] In [1]: import numpy
In [2]: masked = numpy.ma.masked_array([[1, 2, 3, 4, 5]], mask=False)
In [3]: masked[:] = numpy.fliplr(masked.copy())
In [4]: print masked [[5 4 3 2 1]]
In [5]: masked[:] = numpy.fliplr(masked)
In [6]: print masked [[1 2 3 2 1]]
Note that
numpy.fliplr(masked).base.base.base.base is masked.base True
The reason for the strange behavior of slice assignment is that when the left and right sides in a slice assignment are overlapping views of the same array, the result is currently effectively undefined. Same is true for ndarrays:
import numpy a = numpy.array([1, 2, 3, 4, 5]) a[::-1] array([5, 4, 3, 2, 1]) a[:] = a[::-1] a array([5, 4, 3, 4, 5])
This is a known issue. I'm not sure how easy would it be to arrange the assignment loops so that the overlapping data would be handled correctly. I think that numpy should at least raise a warning if not an error in __setitem__, when the two arrays have the same ancestor (needs walking up the .base links). What's the general opinion on this? Pauli

On Sat, May 31, 2008 at 3:09 PM, Pauli Virtanen <pav@iki.fi> wrote:
The reason for the strange behavior of slice assignment is that when the left and right sides in a slice assignment are overlapping views of the same array, the result is currently effectively undefined. Same is true for ndarrays:
import numpy a = numpy.array([1, 2, 3, 4, 5]) a[::-1] array([5, 4, 3, 2, 1]) a[:] = a[::-1] a array([5, 4, 3, 4, 5])
Here's a fun one:
x = np.random.rand(2,5) x.round()
array([[ 0., 1., 0., 0., 0.], [ 0., 0., 0., 0., 1.]])
x.round(out=x[::-1])
array([[ 0., 1., 0., 0., 0.], [ 0., 1., 0., 0., 0.]]) Looks like the top row of x is rounded first and the result is placed in the bottom row. Then the bottom row is evaluated (taking the round of the already rounded top row) and placed in the top row. So the top and bottom will always be the same. That must be useful somewhere :)

2008/5/31 Pauli Virtanen <pav@iki.fi>:
The reason for the strange behavior of slice assignment is that when the left and right sides in a slice assignment are overlapping views of the same array, the result is currently effectively undefined. Same is true for ndarrays:
import numpy a = numpy.array([1, 2, 3, 4, 5]) a[::-1] array([5, 4, 3, 2, 1]) a[:] = a[::-1] a array([5, 4, 3, 4, 5])
I think that the current rule is, slices are walked from low index to high index. This doesn't help with multidimensional arrays, where the order of the axes is (and should be) determined by efficiency considerations. Unfortunately there's really no good way to warn about overlapping copies. Remember that this is a frequent operation, so it has to be fast for small arrays. I think changing base so that it points to the real base and not the parent would help (and clear up a memory leak: try "while True: A = A[::-1]" some time) eliminate some cases where overlap cannot occur, but what about the following cases? A[:5] = A[-5:] A[::2] = A[1::2] A[1:] = A[-1:] The last is actually fairly common (I've needed it), and relies on numpy's ordering of copies. The middle one is very common, and the first one would be a royal pain to code around if the slices were not allowed to overlap. I think I at one point wrote an algorithm that would detect many cases where overlap could not occur (maybe in the smarter reshape code?) but I came to the conclusion that detecting all the cases was infeasible. It's a number-theoretic problem - "can you write the same number as the sum of multiples of these two lists of numbers with nonnegative integer coefficients less than these other two lists of numbers?" - and I suspect it's NP-complete. (Ah, yes, you can make a knapsack out of it - take an array with strides a_1, ... a_n and shape (2,...,2), and a single-element array starting at position N.) In any case it's infeasible to solve it every time somebody tries to do a slice assignment. In any case, many users need nearly-overlapping slices, and some need really-overlapping slices. Preventing problems is going to have to happen at a higher level. Anne

su, 2008-06-01 kello 03:37 -0400, Anne Archibald kirjoitti:
2008/5/31 Pauli Virtanen <pav@iki.fi>:
The reason for the strange behavior of slice assignment is that when the left and right sides in a slice assignment are overlapping views of the same array, the result is currently effectively undefined. Same is true for ndarrays:
import numpy a = numpy.array([1, 2, 3, 4, 5]) a[::-1] array([5, 4, 3, 2, 1]) a[:] = a[::-1] a array([5, 4, 3, 4, 5])
I think that the current rule is, slices are walked from low index to high index. This doesn't help with multidimensional arrays, where the order of the axes is (and should be) determined by efficiency considerations.
Yes, I figured that it is not possible to do what the user usually means in all cases without using at least one scalar temporary. [clip]
Unfortunately there's really no good way to warn about overlapping copies. Remember that this is a frequent operation, so it has to be fast for small arrays.
I didn't actually mean detecting overlaps, but detecting when assigning to a view: while (a->base) a = a->base; while (b->base) b = b->base; if (a == b) { raise warning, but continue } The leading loops can be removed if the bases are walked upwards on array creation, and in this case an added single branch cannot be so bad of an performance hit. If the branch is taken, then there's a performance hit of course.
I think changing base so that it points to the real base and not the parent would help (and clear up a memory leak: try "while True: A = A[::-1]" some time) eliminate some cases where overlap cannot occur, but what about the following cases?
A[:5] = A[-5:] A[::2] = A[1::2] A[1:] = A[-1:]
The last is actually fairly common (I've needed it), and relies on numpy's ordering of copies. The middle one is very common, and the first one would be a royal pain to code around if the slices were not allowed to overlap.
But maybe the warning is still too much hand-holding, and we can just add a warning about undefined behavior in the documentation and leave it at that. I can't argue much against your examples. If the behavior in 1d for overlapping parts is well-defined and relied upon, we should add a unit test that ensures it, if there isn't one yet (didn't check yet). [clip]
In any case, many users need nearly-overlapping slices, and some need really-overlapping slices. Preventing problems is going to have to happen at a higher level.
"Higher level" means "Documentation" here, right? Pauli

On Sun, Jun 1, 2008 at 1:37 AM, Anne Archibald <peridot.faceted@gmail.com> wrote:
2008/5/31 Pauli Virtanen <pav@iki.fi>:
The reason for the strange behavior of slice assignment is that when the left and right sides in a slice assignment are overlapping views of the same array, the result is currently effectively undefined. Same is true for ndarrays:
import numpy a = numpy.array([1, 2, 3, 4, 5]) a[::-1] array([5, 4, 3, 2, 1]) a[:] = a[::-1] a array([5, 4, 3, 4, 5])
I think that the current rule is, slices are walked from low index to high index. This doesn't help with multidimensional arrays, where the order of the axes is (and should be) determined by efficiency considerations.
Unfortunately there's really no good way to warn about overlapping copies. Remember that this is a frequent operation, so it has to be fast for small arrays.
I think changing base so that it points to the real base and not the parent would help (and clear up a memory leak: try "while True: A = A[::-1]" some time) eliminate some cases where overlap cannot occur, but what about the following cases?
A[:5] = A[-5:] A[::2] = A[1::2] A[1:] = A[-1:]
The last is actually fairly common (I've needed it), and relies on numpy's ordering of copies. The middle one is very common, and the first one would be a royal pain to code around if the slices were not allowed to overlap.
I think I at one point wrote an algorithm that would detect many cases where overlap could not occur (maybe in the smarter reshape code?) but I came to the conclusion that detecting all the cases was infeasible. It's a number-theoretic problem - "can you write the same number as the sum of multiples of these two lists of numbers with nonnegative integer coefficients less than these other two lists of numbers?" - and I suspect it's NP-complete. (Ah, yes, you can make a knapsack out of it - take an array with strides a_1, ... a_n and shape (2,...,2), and a single-element array starting at position N.) In any case it's infeasible to solve it every time somebody tries to do a slice assignment.
Since one could compute all the addresses and check for duplicates, I would guess it is O(n), maybe O(n*log(n)) worst case, but I can't convince myself that it would help much to know about overlaps. As you point out, a lot of times one needs overlapping ranges, it is when the user has designed an algorithm without taking the possibility into account that problems arise. Chuck
participants (7)
-
Anne Archibald
-
Charles R Harris
-
Keith Goodman
-
Matthieu Brucher
-
Pauli Virtanen
-
Pierre GM
-
Tony Yu