
On 24.08.2021 05:53, Steven D'Aprano wrote:
At the moment, the handling of NANs in the statistics module is implementation dependent. In practice, that *usually* means that if your data has a NAN in it, the result you get will probably be a NAN.
>>> statistics.mean([1, 2, float('nan'), 4]) nan
But there are unfortunate exceptions to this:
>>> statistics.median([1, 2, float('nan'), 4]) nan >>> statistics.median([float('nan'), 1, 2, 4]) 1.5
I've spoken to users of other statistics packages and languages, such as R, and I cannot find any consensus on what the "right" behaviour should be for NANs except "not that!".
So I propose that statistics functions gain a keyword only parameter to specify the desired behaviour when a NAN is found:
- raise an exception
- return NAN
- ignore it (filter out NANs)
which seem to be the three most common preference. (It seems to be split roughly equally between the three.)
Thoughts? Objections?
Sounds good. This is similar to the errors argument we have for codecs where users can determine what the behavior should be in case of an error in processing.
Does anyone have any strong feelings about what should be the default?
No strong preference, but if the objective is to continue calculations as much as possible even in the face of missing values, returning NAN is the better choice. Second best would be an exception, IMO, to signal: please be explicit about what to do about NANs in the calculation. It helps reduce the needed backtracking when the end result of a calculation turns out to be NAN. Filtering out NANs should always be an explicit choice to make. Ideally such filtering should happen *before* any calculations get applied. In some cases, it's better to replace NANs with use case specific default values. In others, removing them is the right thing to do. Note that e.g. SQL defaults to ignoring NULLs in aggregate functions such as AVG(), so there are standard precedents for ignoring NAN values per default as well. And yes, that default can lead to wrong results in reports which are hard to detect. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Aug 24 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/