Actually, I wouldn't mind passing a key function to _median(), but that is way too advanced for the beginner users to have to think about. So maybe median() could call _median() internally where needed, but the underscore version could exist also. On Sun, Dec 29, 2019 at 8:14 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 29, 2019, at 16:08, David Mertz <mertz@gnosis.cx> wrote:
* There is absolutely no need to lose any efficiency by making the
statistics functions more friendly. All we need is an optional parameter whose spelling I've suggested as `on_nan` (but bikeshed freely). Under at least one value of that parameter, we can keep EXACTLY the current implementation, with all its warts and virtues as-is. Maybe a spelling for that option could be 'unsafe' or 'fast'?
This seems like the right way to go to me.
However, rather than coming up with appropriately-general implementations of each of these things, wouldn’t taking a key function to pass through to sorted be simpler for some? In particular, coming up with a total_order function that works for all valid number-like types is difficult; letting the user pass key=math.total_order or decimal.Decimal.compare_total or partial(decimal.Decimal.compare_total, context=my_context) or whatever is appropriate is a lot simpler and a lot more flexible. Anyone who knows that’s what they want should know how to pass it.
Plus, finding the median_low or _high, with a key function actually seems useful even without NaNs. “Find the median employee by salary” doesn’t seem like a meaningless operation.
A key function could also take care of raise, but not ignore or poison, and at least ignore seems like it’s needed. So your API still makes sense otherwise. (But, while we’re painting the shed, maybe enum values instead of bare strings? They could be StrEnum values where FAST.value == 'fast' for people who are used to Pandas, I suppose.)
Maybe the is_nan function could also be a parameter, like the key function. By default it’s just the method with a fallback to math or cmath (or it’s just the method, and float and complex add those methods, or it’s a new function that calls a new protocol method, or whatever). That doesn’t work for every possible type that might otherwise work with statistics, but if you have some other type—or want some other unusual but sensible behavior (e.g., you’re the one guy who actually needs to ignore qNaNs but raise early on sNaNs), you can write it and pass it. I’m still not convinced anyone will ever want anymore other than the method/math/cmath version, but if they do, I think they’d know it and be fine with passing it in explicitly.
As far as your implementation, I don’t think anything but ignore needs to preprocess things. Raise can just pass a key function that raises on NaN to sorted. Poison can do the same but handle the exception by returning NaN. Who cares that it might take slightly longer to hit the first NaN that way than by doing an extra pass, if it’s simpler and slightly faster for the non-exceptional case?
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.