
On Thu, Jun 18, 2015 at 8:27 AM, <josef.pktd@gmail.com> wrote:
On Thu, Jun 18, 2015 at 6:16 AM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On Wed, Jun 17, 2015 at 10:44 PM, Abraham Escalante <aeklant@gmail.com> wrote:
Hello all,
As part of the ongoing scipy.stats improvements we are pondering the deprecation of `stats.threshold` (and its masked array counterpart: `mstats.threshold`) for the following reasons.
The functionality it provides is nearly identical to `np.clip`. Its usage does not seem to be common (Ralf made a search with searchcode; it is not used in scipy as a helper function either).
I don't think those are sufficient reasons for deprecation. It does fullfil a purpose as its not exactly the same np.clip, the implementation is simple and maintainable and its documented well. There has to be something bad or dangerous about the function to warrant issuing warnings on usage.
I pretty much share the view of David, It has interesting use cases but it's not worth it.
The use case I was thinking of is to calculate trimmed statistics with nan aware functions. Similar to David's example, we can set outliers, points beyond the threshold to nan, and then use nanmean and nanstd to calculate the trimmed statistics.
Trimming is dropping the outliers, while np.clip is "winsorizing" the outliers, i.e. shrink them to the thressholds. For this np.clip is not a replacement for stats.threshold.
However:
My guess is that this was used as a helper function for the trimmed statistics in scipy.stats but lost it's use during some refactoring.
As a public function it would belong to numpy. I didn't remember stats.threshold, and it's easy to "inline" by masked indexing. I don't think users would think about looking for it in scipy.stats (as indicated by the missing use according to Ralf's search). Even if I'd remember the threshold function, I wouldn't use it because then I need to import scipy.stats and large parts of scipy (which is slow in cold start) just for a one liner.
to add to the last point functions like np.clip and threshold are only useful as quick helper functions. Most times when I do more serious work with trimming or clipping, I would want to get hold of the mask, either to know what the outliers are or for further processing. (statsmodels is using np.clip quite often to clip arrays to the domain of a function.) Josef
Josef
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev