
Hello all, As part of the ongoing scipy.stats improvements we are pondering the deprecation of `stats.threshold` (and its masked array counterpart: `mstats.threshold`) for the following reasons. - The functionality it provides is nearly identical to `np.clip`. - Its usage does not seem to be common (Ralf made a search with searchcode <https://searchcode.com/>; it is not used in scipy as a helper function either). Of course, before we deprecate anything, we would like to know if anyone in the community is a regular user of this function and/or if you guys may have a use case where it may be preferable to use `stats.threshold` over `np.clip`. Please reply if you have any objections to this deprecation. You can find the corresponding PR here: gh-4976 <https://github.com/scipy/scipy/pull/4976> Regards, Abraham. PS. For reference, both `np.clip` and `stats.threshold` replace the values outside a threshold from an array_like input. The difference is that `stats.threshold` replaces all values below the minimum or above the maximum with the same new value whereas `np.clip` uses the minimum to replace those below and the maximum for those above. Example:
a = np.arange(10)
a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.clip(a, 3, 7)
array([3, 3, 3, 3, 4, 5, 6, 7, 7, 7])
stats.threshold(a, 3, 7, -1)
array([-1, -1, -1, 3, 4, 5, 6, 7, -1, -1])

On 17 June 2015 at 22:44, Abraham Escalante <aeklant@gmail.com> wrote:
Of course, before we deprecate anything, we would like to know if anyone in the community is a regular user of this function and/or if you guys may have a use case where it may be preferable to use `stats.threshold` over `np.clip`.
I wasn't aware of this function, so I am not using it, but I could see one case. In astronomical data (magnitudes), sometimes missing data is represented by 99, -99, 999, 99.99... or variations; and more often than one would wish, the same file contains different filling values. So, what I do: arr[np.abs(arr) < 50] = np.nan Can be replaced by: stats.threshold(arr, -50, 50, np.nan) This said, I don't think Scipy's version adds anything new, or enhances readability. After all, all it is doing is a straightforward application of a mask; the user can do the same by hand in a more flexible way. So, in summary, I am for the deprecation. /David.

On Wed, Jun 17, 2015 at 10:44 PM, Abraham Escalante <aeklant@gmail.com> wrote:
Hello all,
As part of the ongoing scipy.stats improvements we are pondering the deprecation of `stats.threshold` (and its masked array counterpart: `mstats.threshold`) for the following reasons.
The functionality it provides is nearly identical to `np.clip`. Its usage does not seem to be common (Ralf made a search with searchcode; it is not used in scipy as a helper function either).
I don't think those are sufficient reasons for deprecation. It does fullfil a purpose as its not exactly the same np.clip, the implementation is simple and maintainable and its documented well. There has to be something bad or dangerous about the function to warrant issuing warnings on usage.

On Thu, Jun 18, 2015 at 6:16 AM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On Wed, Jun 17, 2015 at 10:44 PM, Abraham Escalante <aeklant@gmail.com> wrote:
Hello all,
As part of the ongoing scipy.stats improvements we are pondering the deprecation of `stats.threshold` (and its masked array counterpart: `mstats.threshold`) for the following reasons.
The functionality it provides is nearly identical to `np.clip`. Its usage does not seem to be common (Ralf made a search with searchcode; it is not used in scipy as a helper function either).
I don't think those are sufficient reasons for deprecation. It does fullfil a purpose as its not exactly the same np.clip, the implementation is simple and maintainable and its documented well. There has to be something bad or dangerous about the function to warrant issuing warnings on usage.
I pretty much share the view of David, It has interesting use cases but it's not worth it. The use case I was thinking of is to calculate trimmed statistics with nan aware functions. Similar to David's example, we can set outliers, points beyond the threshold to nan, and then use nanmean and nanstd to calculate the trimmed statistics. Trimming is dropping the outliers, while np.clip is "winsorizing" the outliers, i.e. shrink them to the thressholds. For this np.clip is not a replacement for stats.threshold. However: My guess is that this was used as a helper function for the trimmed statistics in scipy.stats but lost it's use during some refactoring. As a public function it would belong to numpy. I didn't remember stats.threshold, and it's easy to "inline" by masked indexing. I don't think users would think about looking for it in scipy.stats (as indicated by the missing use according to Ralf's search). Even if I'd remember the threshold function, I wouldn't use it because then I need to import scipy.stats and large parts of scipy (which is slow in cold start) just for a one liner. Josef
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On Thu, Jun 18, 2015 at 8:27 AM, <josef.pktd@gmail.com> wrote:
On Thu, Jun 18, 2015 at 6:16 AM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On Wed, Jun 17, 2015 at 10:44 PM, Abraham Escalante <aeklant@gmail.com> wrote:
Hello all,
As part of the ongoing scipy.stats improvements we are pondering the deprecation of `stats.threshold` (and its masked array counterpart: `mstats.threshold`) for the following reasons.
The functionality it provides is nearly identical to `np.clip`. Its usage does not seem to be common (Ralf made a search with searchcode; it is not used in scipy as a helper function either).
I don't think those are sufficient reasons for deprecation. It does fullfil a purpose as its not exactly the same np.clip, the implementation is simple and maintainable and its documented well. There has to be something bad or dangerous about the function to warrant issuing warnings on usage.
I pretty much share the view of David, It has interesting use cases but it's not worth it.
The use case I was thinking of is to calculate trimmed statistics with nan aware functions. Similar to David's example, we can set outliers, points beyond the threshold to nan, and then use nanmean and nanstd to calculate the trimmed statistics.
Trimming is dropping the outliers, while np.clip is "winsorizing" the outliers, i.e. shrink them to the thressholds. For this np.clip is not a replacement for stats.threshold.
However:
My guess is that this was used as a helper function for the trimmed statistics in scipy.stats but lost it's use during some refactoring.
As a public function it would belong to numpy. I didn't remember stats.threshold, and it's easy to "inline" by masked indexing. I don't think users would think about looking for it in scipy.stats (as indicated by the missing use according to Ralf's search). Even if I'd remember the threshold function, I wouldn't use it because then I need to import scipy.stats and large parts of scipy (which is slow in cold start) just for a one liner.
to add to the last point functions like np.clip and threshold are only useful as quick helper functions. Most times when I do more serious work with trimming or clipping, I would want to get hold of the mask, either to know what the outliers are or for further processing. (statsmodels is using np.clip quite often to clip arrays to the domain of a function.) Josef
Josef
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On 18.06.2015 14:27, josef.pktd@gmail.com wrote:
On Thu, Jun 18, 2015 at 6:16 AM, Julian Taylor <jtaylor.debian@googlemail.com <mailto:jtaylor.debian@googlemail.com>> wrote:
On Wed, Jun 17, 2015 at 10:44 PM, Abraham Escalante <aeklant@gmail.com <mailto:aeklant@gmail.com>> wrote: > Hello all, > > As part of the ongoing scipy.stats improvements we are pondering the > deprecation of `stats.threshold` (and its masked array counterpart: > `mstats.threshold`) for the following reasons. > > The functionality it provides is nearly identical to `np.clip`. > Its usage does not seem to be common (Ralf made a search with searchcode; it > is not used in scipy as a helper function either).
I don't think those are sufficient reasons for deprecation. It does fullfil a purpose as its not exactly the same np.clip, the implementation is simple and maintainable and its documented well. There has to be something bad or dangerous about the function to warrant issuing warnings on usage.
I pretty much share the view of David, It has interesting use cases but it's not worth it.
I don't see the cost in keeping it, but the cost of removing it is unknown. Just because we can't find any users does not mean they don't exist.

On Fri, Jun 19, 2015 at 9:30 AM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On 18.06.2015 14:27, josef.pktd@gmail.com wrote:
On Thu, Jun 18, 2015 at 6:16 AM, Julian Taylor <jtaylor.debian@googlemail.com <mailto:jtaylor.debian@googlemail.com>> wrote:
On Wed, Jun 17, 2015 at 10:44 PM, Abraham Escalante <aeklant@gmail.com <mailto:aeklant@gmail.com>> wrote: > Hello all, > > As part of the ongoing scipy.stats improvements we are pondering
the
> deprecation of `stats.threshold` (and its masked array counterpart: > `mstats.threshold`) for the following reasons. > > The functionality it provides is nearly identical to `np.clip`. > Its usage does not seem to be common (Ralf made a search with
searchcode; it
> is not used in scipy as a helper function either).
I don't think those are sufficient reasons for deprecation. It does fullfil a purpose as its not exactly the same np.clip, the implementation is simple and maintainable and its documented well. There has to be something bad or dangerous about the function to warrant issuing warnings on usage.
Those are not the only possible reasons for deprecation. In this case it's a function that doesn't really fit in scipy.stats and seems to have become a public function only by accident. The goal here, like for multiple other recent deprecations, is to make scipy.stats a coherent package of statistics functions that are well documented and tested. In this case docs/tests are OK but the function simply doesn't belong in Scipy.
I pretty much share the view of David, It has interesting use cases but it's not worth it.
I don't see the cost in keeping it, but the cost of removing it is unknown. Just because we can't find any users does not mean they don't exist.
You could make that argument about any deprecation. While the Scipy deprecation policy is similar to Numpy, this kind of case is the main difference in my opinion. There's a reason Scipy still has an 0.xx version number. Ralf

I think we should move forward with the deprecation since `np.clip` pretty much covers this and as Ralf points out, the function doesn't seem to fit in `scipy.stats`. It would make more sense for `np.clip` to be enhanced with the option to use the same value to substitute anything below and above the limits, although that would be outside the scope of this project. It may be a nice addition. Regards, Abraham. 2015-06-21 9:52 GMT-05:00 Ralf Gommers <ralf.gommers@gmail.com>:
On Fri, Jun 19, 2015 at 9:30 AM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On 18.06.2015 14:27, josef.pktd@gmail.com wrote:
On Thu, Jun 18, 2015 at 6:16 AM, Julian Taylor <jtaylor.debian@googlemail.com <mailto:jtaylor.debian@googlemail.com>> wrote:
On Wed, Jun 17, 2015 at 10:44 PM, Abraham Escalante <aeklant@gmail.com <mailto:aeklant@gmail.com>> wrote: > Hello all, > > As part of the ongoing scipy.stats improvements we are pondering
the
> deprecation of `stats.threshold` (and its masked array
counterpart:
> `mstats.threshold`) for the following reasons. > > The functionality it provides is nearly identical to `np.clip`. > Its usage does not seem to be common (Ralf made a search with
searchcode; it
> is not used in scipy as a helper function either).
I don't think those are sufficient reasons for deprecation. It does fullfil a purpose as its not exactly the same np.clip, the implementation is simple and maintainable and its documented well. There has to be something bad or dangerous about the function to warrant issuing warnings on usage.
Those are not the only possible reasons for deprecation. In this case it's a function that doesn't really fit in scipy.stats and seems to have become a public function only by accident. The goal here, like for multiple other recent deprecations, is to make scipy.stats a coherent package of statistics functions that are well documented and tested. In this case docs/tests are OK but the function simply doesn't belong in Scipy.
I pretty much share the view of David, It has interesting use cases but it's not worth it.
I don't see the cost in keeping it, but the cost of removing it is unknown. Just because we can't find any users does not mean they don't exist.
You could make that argument about any deprecation.
While the Scipy deprecation policy is similar to Numpy, this kind of case is the main difference in my opinion. There's a reason Scipy still has an 0.xx version number.
Ralf
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

Hello all, As per the reasons and discussion held within this thread we will be moving forward with the deprecation `stats.threshold` unless we can find a compelling argument against it. So please, if you have an opinion to share on this matter, feel free to respond here. Otherwise we will merge gh-4976 <https://github.com/scipy/scipy/pull/4976> with the deprecation in the next three or four days, most likely. Kind regards, Abraham. 2015-06-25 17:51 GMT-05:00 Abraham Escalante <aeklant@gmail.com>:
I think we should move forward with the deprecation since `np.clip` pretty much covers this and as Ralf points out, the function doesn't seem to fit in `scipy.stats`.
It would make more sense for `np.clip` to be enhanced with the option to use the same value to substitute anything below and above the limits, although that would be outside the scope of this project. It may be a nice addition.
Regards, Abraham.
2015-06-21 9:52 GMT-05:00 Ralf Gommers <ralf.gommers@gmail.com>:
On Fri, Jun 19, 2015 at 9:30 AM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On 18.06.2015 14:27, josef.pktd@gmail.com wrote:
On Thu, Jun 18, 2015 at 6:16 AM, Julian Taylor <jtaylor.debian@googlemail.com <mailto:jtaylor.debian@googlemail.com>> wrote:
On Wed, Jun 17, 2015 at 10:44 PM, Abraham Escalante <aeklant@gmail.com <mailto:aeklant@gmail.com>> wrote: > Hello all, > > As part of the ongoing scipy.stats improvements we are pondering
the
> deprecation of `stats.threshold` (and its masked array
counterpart:
> `mstats.threshold`) for the following reasons. > > The functionality it provides is nearly identical to `np.clip`. > Its usage does not seem to be common (Ralf made a search with
searchcode; it
> is not used in scipy as a helper function either).
I don't think those are sufficient reasons for deprecation. It does fullfil a purpose as its not exactly the same np.clip, the implementation is simple and maintainable and its documented well. There has to be something bad or dangerous about the function to warrant issuing warnings on usage.
Those are not the only possible reasons for deprecation. In this case it's a function that doesn't really fit in scipy.stats and seems to have become a public function only by accident. The goal here, like for multiple other recent deprecations, is to make scipy.stats a coherent package of statistics functions that are well documented and tested. In this case docs/tests are OK but the function simply doesn't belong in Scipy.
I pretty much share the view of David, It has interesting use cases but it's not worth it.
I don't see the cost in keeping it, but the cost of removing it is unknown. Just because we can't find any users does not mean they don't exist.
You could make that argument about any deprecation.
While the Scipy deprecation policy is similar to Numpy, this kind of case is the main difference in my opinion. There's a reason Scipy still has an 0.xx version number.
Ralf
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

Hello all, gh-4976 <https://github.com/scipy/scipy/pull/4976> has been merged and thus `stats.threshold` deprecated. Thanks to all who contributed with the discussion/feedback. Cheers, Abraham. 2015-06-29 18:36 GMT-05:00 Abraham Escalante <aeklant@gmail.com>:
Hello all,
As per the reasons and discussion held within this thread we will be moving forward with the deprecation `stats.threshold` unless we can find a compelling argument against it.
So please, if you have an opinion to share on this matter, feel free to respond here. Otherwise we will merge gh-4976 <https://github.com/scipy/scipy/pull/4976> with the deprecation in the next three or four days, most likely.
Kind regards, Abraham.
2015-06-25 17:51 GMT-05:00 Abraham Escalante <aeklant@gmail.com>:
I think we should move forward with the deprecation since `np.clip` pretty much covers this and as Ralf points out, the function doesn't seem to fit in `scipy.stats`.
It would make more sense for `np.clip` to be enhanced with the option to use the same value to substitute anything below and above the limits, although that would be outside the scope of this project. It may be a nice addition.
Regards, Abraham.
2015-06-21 9:52 GMT-05:00 Ralf Gommers <ralf.gommers@gmail.com>:
On Fri, Jun 19, 2015 at 9:30 AM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On 18.06.2015 14:27, josef.pktd@gmail.com wrote:
On Thu, Jun 18, 2015 at 6:16 AM, Julian Taylor <jtaylor.debian@googlemail.com <mailto:jtaylor.debian@googlemail.com
wrote:
On Wed, Jun 17, 2015 at 10:44 PM, Abraham Escalante <aeklant@gmail.com <mailto:aeklant@gmail.com>> wrote: > Hello all, > > As part of the ongoing scipy.stats improvements we are
pondering the
> deprecation of `stats.threshold` (and its masked array
counterpart:
> `mstats.threshold`) for the following reasons. > > The functionality it provides is nearly identical to `np.clip`. > Its usage does not seem to be common (Ralf made a search with
searchcode; it
> is not used in scipy as a helper function either).
I don't think those are sufficient reasons for deprecation. It does fullfil a purpose as its not exactly the same np.clip, the implementation is simple and maintainable and its documented well. There has to be something bad or dangerous about the function to warrant issuing warnings on usage.
Those are not the only possible reasons for deprecation. In this case it's a function that doesn't really fit in scipy.stats and seems to have become a public function only by accident. The goal here, like for multiple other recent deprecations, is to make scipy.stats a coherent package of statistics functions that are well documented and tested. In this case docs/tests are OK but the function simply doesn't belong in Scipy.
I pretty much share the view of David, It has interesting use cases but it's not worth it.
I don't see the cost in keeping it, but the cost of removing it is unknown. Just because we can't find any users does not mean they don't exist.
You could make that argument about any deprecation.
While the Scipy deprecation policy is similar to Numpy, this kind of case is the main difference in my opinion. There's a reason Scipy still has an 0.xx version number.
Ralf
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On Wed, Jun 17, 2015 at 10:44 PM, Abraham Escalante <aeklant@gmail.com> wrote:
Hello all,
As part of the ongoing scipy.stats improvements we are pondering the deprecation of `stats.threshold` (and its masked array counterpart: `mstats.threshold`) for the following reasons.
- The functionality it provides is nearly identical to `np.clip`. - Its usage does not seem to be common (Ralf made a search with searchcode <https://searchcode.com/>; it is not used in scipy as a helper function either).
Of course, before we deprecate anything, we would like to know if anyone in the community is a regular user of this function and/or if you guys may have a use case where it may be preferable to use `stats.threshold` over `np.clip`.
Please reply if you have any objections to this deprecation.
You can find the corresponding PR here: gh-4976 <https://github.com/scipy/scipy/pull/4976>
Regards, Abraham.
PS. For reference, both `np.clip` and `stats.threshold` replace the values outside a threshold from an array_like input. The difference is that `stats.threshold` replaces all values below the minimum or above the maximum with the same new value whereas `np.clip` uses the minimum to replace those below and the maximum for those above.
Would it be possible to add an optional argument to `np.clip` to allow it to support the `stats.threshold` use-case?

On Thu, Jun 18, 2015 at 3:04 PM, Todd <toddrjen@gmail.com> wrote:
On Wed, Jun 17, 2015 at 10:44 PM, Abraham Escalante <aeklant@gmail.com> wrote:
Hello all,
As part of the ongoing scipy.stats improvements we are pondering the deprecation of `stats.threshold` (and its masked array counterpart: `mstats.threshold`) for the following reasons.
- The functionality it provides is nearly identical to `np.clip`. - Its usage does not seem to be common (Ralf made a search with searchcode <https://searchcode.com/>; it is not used in scipy as a helper function either).
Of course, before we deprecate anything, we would like to know if anyone in the community is a regular user of this function and/or if you guys may have a use case where it may be preferable to use `stats.threshold` over `np.clip`.
Please reply if you have any objections to this deprecation.
You can find the corresponding PR here: gh-4976 <https://github.com/scipy/scipy/pull/4976>
Regards, Abraham.
PS. For reference, both `np.clip` and `stats.threshold` replace the values outside a threshold from an array_like input. The difference is that `stats.threshold` replaces all values below the minimum or above the maximum with the same new value whereas `np.clip` uses the minimum to replace those below and the maximum for those above.
Would it be possible to add an optional argument to `np.clip` to allow it to support the `stats.threshold` use-case?
Adding a keyword after out=None in np.clip would be slightly ugly, but it's possible and may make sense. Ralf
participants (6)
-
Abraham Escalante
-
Daπid
-
josef.pktd@gmail.com
-
Julian Taylor
-
Ralf Gommers
-
Todd