[Numpy-discussion] Fix masked arrays to properly edit views

John Kirkham jakirkham at gmail.com
Sat Apr 4 11:52:19 EDT 2015


Hey Eric,

That's a good point. I remember seeing this behavior before and thought it was a bit odd.

Best,
John

> On Mar 16, 2015, at 2:20 AM, numpy-discussion-request at scipy.org wrote:
> 
> Send NumPy-Discussion mailing list submissions to
>    numpy-discussion at scipy.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>    http://mail.scipy.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
>    numpy-discussion-request at scipy.org
> 
> You can reach the person managing the list at
>    numpy-discussion-owner at scipy.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: Fix masked arrays to properly edit views (Eric Firing)
>   2. Rewrite np.histogram in c? (Robert McGibbon)
>   3. numpy.stack -- which function, if any,    deserves the name?
>      (Stephan Hoyer)
>   4. Re: Rewrite np.histogram in c? (Jaime Fern?ndez del R?o)
>   5. Re: Rewrite np.histogram in c? (Robert McGibbon)
>   6. Re: Rewrite np.histogram in c? (Robert McGibbon)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Sat, 14 Mar 2015 14:01:04 -1000
> From: Eric Firing <efiring at hawaii.edu>
> Subject: Re: [Numpy-discussion] Fix masked arrays to properly edit
>    views
> To: numpy-discussion at scipy.org
> Message-ID: <5504CBC0.1080502 at hawaii.edu>
> Content-Type: text/plain; charset=windows-1252; format=flowed
> 
>> On 2015/03/14 1:02 PM, John Kirkham wrote:
>> The sample case of the issue (
>> https://github.com/numpy/numpy/issues/5558 ) is shown below. A proposal
>> to address this behavior can be found here (
>> https://github.com/numpy/numpy/pull/5580 ). Please give me your feedback.
>> 
>> 
>> I tried to change the mask of `a` through a subindexed view, but was
>> unable. Using this setup I can reproduce this in the 1.9.1 version of NumPy.
>> 
>>     import numpy as np
>> 
>>     a = np.arange(6).reshape(2,3)
>>     a = np.ma.masked_array(a, mask=np.ma.getmaskarray(a), shrink=False)
>> 
>>     b = a[1:2,1:2]
>> 
>>     c = np.zeros(b.shape, b.dtype)
>>     c = np.ma.masked_array(c, mask=np.ma.getmaskarray(c), shrink=False)
>>     c[:] = np.ma.masked
>> 
>> This yields what one would expect for `a`, `b`, and `c` (seen below).
>> 
>>      masked_array(data =
>>        [[0 1 2]
>>         [3 4 5]],
>>                   mask =
>>        [[False False False]
>>         [False False False]],
>>              fill_value = 999999)
>> 
>>      masked_array(data =
>>        [[4]],
>>                   mask =
>>        [[False]],
>>              fill_value = 999999)
>> 
>>      masked_array(data =
>>        [[--]],
>>                   mask =
>>        [[ True]],
>>              fill_value = 999999)
>> 
>> Now, it would seem reasonable that to copy data into `b` from `c` one
>> can use `__setitem__` (seen below).
>> 
>>      b[:] = c
>> 
>> This results in new data and mask for `b`.
>> 
>>      masked_array(data =
>>        [[--]],
>>                   mask =
>>        [[ True]],
>>              fill_value = 999999)
>> 
>> This should, in turn, change `a`. However, the mask of `a` remains
>> unchanged (seen below).
>> 
>>      masked_array(data =
>>        [[0 1 2]
>>         [3 0 5]],
>>                   mask =
>>        [[False False False]
>>         [False False False]],
>>              fill_value = 999999)
> 
> I agree that this behavior is wrong.  A related oddity is this:
> 
> In [24]: a = np.arange(6).reshape(2,3)
> In [25]: a = np.ma.array(a, mask=np.ma.getmaskarray(a), shrink=False)
> In [27]: a.sharedmask
> True
> In [28]: a.unshare_mask()
> In [30]: b = a[1:2, 1:2]
> In [31]: b[:] = np.ma.masked
> In [32]: b.sharedmask
> False
> In [33]: a
> masked_array(data =
>  [[0 1 2]
>  [3 -- 5]],
>              mask =
>  [[False False False]
>  [False  True False]],
>        fill_value = 999999)
> 
> It looks like the sharedmask property simply is not being set and 
> interpreted correctly--a freshly initialized array has sharedmask True; 
> and after setting it to False, changing the mask of a new view *does* 
> change the mask in the original.
> 
> Eric
> 
>> 
>> Best,
>> John
>> 
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Sun, 15 Mar 2015 21:32:49 -0700
> From: Robert McGibbon <rmcgibbo at gmail.com>
> Subject: [Numpy-discussion] Rewrite np.histogram in c?
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
>    <CAN4+E8Ff_Ck-9GBRCbSTq6qPiuGxgKeiX3+kKrXn4NM-Lnn6rg at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi,
> 
> Numpy.histogram is implemented in python, and is a little sluggish. This
> has been discussed previously on the mailing list, [1, 2]. It came up in a
> project that I maintain, where a new feature is bottlenecked by
> numpy.histogram, and one developer suggested a faster implementation in
> cython [3].
> 
> Would it make sense to reimplement this function in c? or cython? Is moving
> functions like this from python to c to improve performance within the
> scope of the development roadmap for numpy? I started implementing this a
> little bit in c, [4] but I figured I should check in here first.
> 
> -Robert
> 
> [1]
> http://scipy-user.10969.n7.nabble.com/numpy-histogram-is-slow-td17208.html
> [2] http://numpy-discussion.10968.n7.nabble.com/Fast-histogram-td9359.html
> [3] https://github.com/mdtraj/mdtraj/pull/734
> [4] https://github.com/rmcgibbo/numpy/tree/histogram
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/84ca916d/attachment-0001.html 
> 
> ------------------------------
> 
> Message: 3
> Date: Sun, 15 Mar 2015 22:12:40 -0700
> From: Stephan Hoyer <shoyer at gmail.com>
> Subject: [Numpy-discussion] numpy.stack -- which function, if any,
>    deserves the name?
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
>    <CAEQ_TvdQwV52_NKnLpM9+cp681NhV5cUEiigmLMtyBkTnzyOcA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> In the past months there have been two proposals for new numpy functions
> using the name "stack":
> 
> 1. np.stack for stacking like np.asarray(np.bmat(...))
> http://thread.gmane.org/gmane.comp.python.numeric.general/58748/
> https://github.com/numpy/numpy/pull/5057
> 
> 2. np.stack for stacking along an arbitrary new axis (this was my proposal)
> http://thread.gmane.org/gmane.comp.python.numeric.general/59850/
> https://github.com/numpy/numpy/pull/5605
> 
> Both functions generalize the notion of stacking arrays from the existing
> hstack, vstack and dstack, but in two very different ways. Both could be
> useful -- but we can only call one "stack". Which one deserves that name?
> 
> The existing *stack functions use the word "stack" to refer to combining
> arrays in two similarly different ways:
> a. For ND -> ND stacking along an existing dimensions (like
> numpy.concatenate and proposal 1)
> b. For ND -> (N+1)D stacking along new dimensions (like proposal 2).
> 
> I think it would be much cleaner API design if we had different words to
> denote these two different operations. Concatenate for "combine along an
> existing dimension" already exists, so my thought (when I wrote proposal
> 2), was that the verb "stack" could be reserved (going forward) for
> "combine along a new dimension." This also has the advantage of suggesting
> that "concatenate" and "stack" are the two fundamental operations for
> combining N-dimensional arrays. The documentation on this is currently
> quite confusing, mostly because no function like that in proposal 2
> currently exists.
> 
> Of course, the *stack functions have existed for quite some time, and in
> many cases vstack and hstack are indeed used for concatenate like
> functionality (e.g., whenever they are used for 2D arrays/matrices). So the
> case is not entirely clear-cut. (We'll never be able to remove this
> functionality from NumPy.)
> 
> In any case, I would appreciate your thoughts.
> 
> Best,
> Stephan
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/5a72a8bb/attachment-0001.html 
> 
> ------------------------------
> 
> Message: 4
> Date: Sun, 15 Mar 2015 23:00:33 -0700
> From: Jaime Fern?ndez del R?o <jaime.frio at gmail.com>
> Subject: Re: [Numpy-discussion] Rewrite np.histogram in c?
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
>    <CAPOWHWmFckwXLcGy+5tSEyQE8VTOrBg0ubKdYeJ8DZywJL_w3g at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
>> On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon <rmcgibbo at gmail.com> wrote:
>> 
>> Hi,
>> 
>> Numpy.histogram is implemented in python, and is a little sluggish. This
>> has been discussed previously on the mailing list, [1, 2]. It came up in a
>> project that I maintain, where a new feature is bottlenecked by
>> numpy.histogram, and one developer suggested a faster implementation in
>> cython [3].
>> 
>> Would it make sense to reimplement this function in c? or cython? Is
>> moving functions like this from python to c to improve performance within
>> the scope of the development roadmap for numpy? I started implementing this
>> a little bit in c, [4] but I figured I should check in here first.
> 
> Where do you think the performance gains will come from? The PR in your
> project that claims a 10x speed-up uses a method that is only fit for
> equally spaced bins. I want to think that implementing that exact same
> algorithm in Python with NumPy would be comparably fast, say within 2x.
> 
> For the general case, NumPy is already doing most of the heavy lifting (the
> sorting and the searching) in C: simply replicating the same algorithmic
> approach entirely in C is unlikely to provide any major speed-up. And if
> the change is to the algorithm, then we should first try it out in Python.
> 
> That said, if you can speed things up 10x, I don't think there is going to
> be much opposition to moving it to C!
> 
> Jaime
> 
> -- 
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
> de dominaci?n mundial.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/ab2c26a9/attachment-0001.html 
> 
> ------------------------------
> 
> Message: 5
> Date: Sun, 15 Mar 2015 23:06:43 -0700
> From: Robert McGibbon <rmcgibbo at gmail.com>
> Subject: Re: [Numpy-discussion] Rewrite np.histogram in c?
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
>    <CAN4+E8GXECy8yaJRfN_NA_V8wdOZeBTLiFM0EJKtfuoONZoMvw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> It might make sense to dispatch to difference c implements if the bins are
> equally spaced (as created by using an integer for the np.histogram bins
> argument), vs. non-equally-spaced bins.
> 
> In that case, getting the bigger speedup may be easier, at least for one
> common use case.
> 
> -Robert
> 
> On Sun, Mar 15, 2015 at 11:00 PM, Jaime Fern?ndez del R?o <
> jaime.frio at gmail.com> wrote:
> 
>> On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon <rmcgibbo at gmail.com>
>> wrote:
>> 
>>> Hi,
>>> 
>>> Numpy.histogram is implemented in python, and is a little sluggish. This
>>> has been discussed previously on the mailing list, [1, 2]. It came up in a
>>> project that I maintain, where a new feature is bottlenecked by
>>> numpy.histogram, and one developer suggested a faster implementation in
>>> cython [3].
>>> 
>>> Would it make sense to reimplement this function in c? or cython? Is
>>> moving functions like this from python to c to improve performance within
>>> the scope of the development roadmap for numpy? I started implementing this
>>> a little bit in c, [4] but I figured I should check in here first.
>> 
>> Where do you think the performance gains will come from? The PR in your
>> project that claims a 10x speed-up uses a method that is only fit for
>> equally spaced bins. I want to think that implementing that exact same
>> algorithm in Python with NumPy would be comparably fast, say within 2x.
>> 
>> For the general case, NumPy is already doing most of the heavy lifting
>> (the sorting and the searching) in C: simply replicating the same
>> algorithmic approach entirely in C is unlikely to provide any major
>> speed-up. And if the change is to the algorithm, then we should first try
>> it out in Python.
>> 
>> That said, if you can speed things up 10x, I don't think there is going to
>> be much opposition to moving it to C!
>> 
>> Jaime
>> 
>> --
>> (\__/)
>> ( O.o)
>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
>> de dominaci?n mundial.
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/0dffb1eb/attachment-0001.html 
> 
> ------------------------------
> 
> Message: 6
> Date: Sun, 15 Mar 2015 23:19:59 -0700
> From: Robert McGibbon <rmcgibbo at gmail.com>
> Subject: Re: [Numpy-discussion] Rewrite np.histogram in c?
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
>    <CAN4+E8Ewn+tPpZBo866qH9p=1=1vA8i6kLFvrX8XKHWwazv44A at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> My apologies for the typo: 'implements' -> 'implementations'
> 
> -Robert
> 
> On Sun, Mar 15, 2015 at 11:06 PM, Robert McGibbon <rmcgibbo at gmail.com>
> wrote:
> 
>> It might make sense to dispatch to difference c implements if the bins are
>> equally spaced (as created by using an integer for the np.histogram bins
>> argument), vs. non-equally-spaced bins.
>> 
>> In that case, getting the bigger speedup may be easier, at least for one
>> common use case.
>> 
>> -Robert
>> 
>> On Sun, Mar 15, 2015 at 11:00 PM, Jaime Fern?ndez del R?o <
>> jaime.frio at gmail.com> wrote:
>> 
>>> On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon <rmcgibbo at gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Numpy.histogram is implemented in python, and is a little sluggish. This
>>>> has been discussed previously on the mailing list, [1, 2]. It came up in a
>>>> project that I maintain, where a new feature is bottlenecked by
>>>> numpy.histogram, and one developer suggested a faster implementation in
>>>> cython [3].
>>>> 
>>>> Would it make sense to reimplement this function in c? or cython? Is
>>>> moving functions like this from python to c to improve performance within
>>>> the scope of the development roadmap for numpy? I started implementing this
>>>> a little bit in c, [4] but I figured I should check in here first.
>>> 
>>> Where do you think the performance gains will come from? The PR in your
>>> project that claims a 10x speed-up uses a method that is only fit for
>>> equally spaced bins. I want to think that implementing that exact same
>>> algorithm in Python with NumPy would be comparably fast, say within 2x.
>>> 
>>> For the general case, NumPy is already doing most of the heavy lifting
>>> (the sorting and the searching) in C: simply replicating the same
>>> algorithmic approach entirely in C is unlikely to provide any major
>>> speed-up. And if the change is to the algorithm, then we should first try
>>> it out in Python.
>>> 
>>> That said, if you can speed things up 10x, I don't think there is going
>>> to be much opposition to moving it to C!
>>> 
>>> Jaime
>>> 
>>> --
>>> (\__/)
>>> ( O.o)
>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
>>> de dominaci?n mundial.
>>> 
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/d22f7d7d/attachment.html 
> 
> ------------------------------
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> End of NumPy-Discussion Digest, Vol 102, Issue 21
> *************************************************



More information about the NumPy-Discussion mailing list