Re: [Python-ideas] NAN handling in the statistics module

On Tue, Jan 08, 2019 at 04:25:17PM +0900, Stephen J. Turnbull wrote:
Steven D'Aprano writes:
By definition, data containing Not A Number values isn't numeric :-)
Unfortunately, that's just a joke, because in fact numeric functions produce NaNs.
I'm not sure if you're agreeing with me or disagreeing, so I'll assume you're agreeing and move on :-)
I agree that this can easily be resolved by documenting that it is the caller's responsibility to remove NaNs from numeric data, but I prefer your proposed flags.
The only reason why I don't call it a bug is that median() makes no promises about NANs at all, any more than it makes promises about the median of a list of sets or any other values which don't define a total order.
Pedantically, I would prefer that the promise that ordinal data (vs. specifically numerical) has a median be made explicit, as there are many cases where statistical data is ordinal.
I think that is reasonable. Provided the data defines a total order, the median is well-defined when there are an odd number of data points, or you can use median_low and median_high regardless of the number of data points.
This may be a moot point, as in most cases ordinal data is represented numerically in computation (Likert scales, for example, are rarely coded as "hate, "dislike", "indifferent", "like", "love", but instead as 1, 2, 3, 4, 5), and from the point of view of UI presentation, IntEnums do the right thing here (print as identifiers, sort as integers).
Perhaps a better way to document this would be to suggest that ordinal data be represented using IntEnums? (Again to be pedantic, one might want OrderedEnums that can be compared but don't allow other arithmetic operations.)
That's a nice solution. -- Steve (the other one)
participants (1)
-
Steven D'Aprano