[Python-ideas] NAN handling in the statistics module

Steven D'Aprano steve at pearwood.info
Tue Jan 8 05:56:20 EST 2019

```On Tue, Jan 08, 2019 at 04:25:17PM +0900, Stephen J. Turnbull wrote:
> Steven D'Aprano writes:
>
>  > By definition, data containing Not A Number values isn't numeric :-)
>
> Unfortunately, that's just a joke, because in fact numeric functions
> produce NaNs.

I'm not sure if you're agreeing with me or disagreeing, so I'll assume
you're agreeing and move on :-)

> I agree that this can easily be resolved by documenting that it is the
> caller's responsibility to remove NaNs from numeric data, but I prefer
>
>  > The only reason why I don't call it a bug is that median() makes no
>  > promises about NANs at all, any more than it makes promises about the
>  > median of a list of sets or any other values which don't define a total
>  > order.
>
> Pedantically, I would prefer that the promise that ordinal data
> (vs. specifically numerical) has a median be made explicit, as there
> are many cases where statistical data is ordinal.

I think that is reasonable.

Provided the data defines a total order, the median is well-defined when
there are an odd number of data points, or you can use median_low and
median_high regardless of the number of data points.

> This may be a moot
> point, as in most cases ordinal data is represented numerically in
> computation (Likert scales, for example, are rarely coded as "hate,
> "dislike", "indifferent", "like", "love", but instead as 1, 2, 3, 4,
> 5), and from the point of view of UI presentation, IntEnums do the
> right thing here (print as identifiers, sort as integers).
>
> Perhaps a better way to document this would be to suggest that ordinal
> data be represented using IntEnums?  (Again to be pedantic, one might
> want OrderedEnums that can be compared but don't allow other
> arithmetic operations.)

That's a nice solution.

--
Steve (the other one)
```