[Numpy-discussion] Median again

Eric Firing efiring at hawaii.edu
Tue Jan 29 17:22:52 EST 2008


Andrew Straw wrote:
> Considering that many of the statistical functions (mean, std, median)
> must iterate over all the data and that people (or at least myself)
> typically call them sequentially on the same data, it may make sense to
> make a super-function with less repetition.

http://currents.soest.hawaii.edu/hg/hgwebdir.cgi/pycurrents/file/df129ff36f68/num/stats.py?style=gitweb

I have something like that, in the link above (if the mailer does not 
break the line).  I think it is quite flexible and efficient; it 
calculates only as much as necessary, so, for example, it only 
calculates the median if you ask for it.

In the file that the link points to, you can import numpy.ma as MA to 
remove its one external dependency.

Eric

> 
> Instead of:
> x_mean = np.mean(x)
> x_median = np.median(x)
> x_std = np.std(x)
> x_min = np.min(x)
> x_max = np.max(x)
> 
> We do:
> x_stats = np.get_descriptive_stats(x,
> stats=['mean','median','std','min','max'],axis=-1)
> And x_stats is a dictionary with 'mean','meadian','std','min', 'max' keys.
> 
> The implementation could reduce the number of iterations over the data
> in this case. The implementation wouldn't have to be optimized
> initially, but could be gradually sped up once the interface is in
> place. I bring this up now to suggest such an idea as a more-general
> alternative to the "medianwithaxis" function proposed. What do you
> think? (Perhaps something like this already exists?) And, finally, this
> all surely belongs in scipy, but we already have stuff in numpy that
> can't be removed without seriously breaking backwards compatibility...
> 
> -Andrew
> 
> Matthew Brett wrote:
>> Hi,
>>   
>>>>> median moved mediandim0
>>>>> implementation of medianwithaxis or similar, with same call
>>>>> signature as mean.
>>>>>
>>>>> Deprecation warning for use of median, and return of mediandim0 for
>>>>> now.  Eventual move of median to return medianwithaxis.
>>>>>         
>>>> This would confuse people even more, I'm afraid. First they're said
>>>> that median() is deprecated, and then later on it becomes the standard
>>>> function to use. I would actually prefer a short pain rather than a
>>>> long one.
>>>>       
>> I was thinking the warning could be something like:
>>
>> "The current and previous version of numpy use a version of median
>> that is not consistent with other summary functions such as mean.  The
>> calling convention of median will change in a future version of numpy
>> to match that of the other summary functions.  This compatible future
>> version is implemented as medianwithaxis, and will become the default
>> implementation of median.  Please change any code using median to call
>> medianwithaxis specifically, to maintain compatibility with future
>> numpy APIs."
>>
>>   
>>> I would certainly like median to take the axis keyword. The axis
>>> keyword (and its friends) could be added to 1.0.5 with the default
>>> being 1 instead of None, so that it keeps compatibility with the 1.0
>>> API. Then, with 1.1 (an API-breaking release) the default can be
>>> changed to None to restore consistency with mean, etc.
>>>     
>> But that would be very surprising to a new user, and might lead to
>> some hard to track down silent bugs at a later date.
>>
>> Matthew
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion at scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>>   
> 
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list