
On 26.08.2021 02:36, Finn Mason wrote:
Perhaps a warning could be raised but the NaNs are ignored. For example:
Input: statistics.mean([4, 2, float('nan')]) Output: [warning blah blah blah] 3
Or the NaNs could be treated as zeros and a warning raised:
Input: statistics.mean([4, 2, float('nan')]) Output: [warning blah blah blah] 2
I do feel there should be a catchable warning but not an outright exception, and a non-NaN value should still be returned. This allows calculations to still quickly and easily be made with or without NaNs, but an alternative course of action can be taken in the presence of a NaN value if desired.
With the keyword argument, you can decide what to do. As for the default: for codecs we made raising an exeception the default, simply because this highlights the need to make an explicit decision. For long running calculations this may not be desirable, but then getting NAN as end result isn't the best compromise either. In practice it's better to check for NANs before entering a calculation and then apply case specific handling, e.g. replace NANs with fixed default values, remove them, use a different heuristic for the calculation, stop the calculation and ask for better input, etc. etc. There are many ways to process things in the face of NANs. In Python you can use a simple test for this:
nan = float('nan') l = [1,2,3,nan] d = {nan:1, 2:3, 4:5, 5:nan} s = set(l) nan in l True nan in d True nan in s True
but this really only makes sense for smaller data sets. If you have a large data set where you rarely get NANs, using the keyword argument may indeed be a better way to go about this.
In any case, the current behavior should definitely be changed.
Indeed. The NAN handling in median() looks like a bug, more than anything else:
import statistics statistics.mean(l) nan statistics.mean(d) nan statistics.mean(s) nan
l1 = [1,2,nan,4] statistics.mean(l1) nan l2 = [nan,1,2,4] statistics.mean(l2) nan
statistics.median(l) 2.5 statistics.median(l1) nan statistics.median(l2) 1.5
On Tue, Aug 24, 2021, 1:46 AM Marc-Andre Lemburg <mal@egenix.com <mailto:mal@egenix.com>> wrote:
On 24.08.2021 05:53, Steven D'Aprano wrote: > At the moment, the handling of NANs in the statistics module is > implementation dependent. In practice, that *usually* means that if your > data has a NAN in it, the result you get will probably be a NAN. > > >>> statistics.mean([1, 2, float('nan'), 4]) > nan > > But there are unfortunate exceptions to this: > > >>> statistics.median([1, 2, float('nan'), 4]) > nan > >>> statistics.median([float('nan'), 1, 2, 4]) > 1.5 > > I've spoken to users of other statistics packages and languages, such as > R, and I cannot find any consensus on what the "right" behaviour should > be for NANs except "not that!". > > So I propose that statistics functions gain a keyword only parameter to > specify the desired behaviour when a NAN is found: > > - raise an exception > > - return NAN > > - ignore it (filter out NANs) > > which seem to be the three most common preference. (It seems to be > split roughly equally between the three.) > > Thoughts? Objections?
Sounds good. This is similar to the errors argument we have for codecs where users can determine what the behavior should be in case of an error in processing.
> Does anyone have any strong feelings about what should be the default?
No strong preference, but if the objective is to continue calculations as much as possible even in the face of missing values, returning NAN is the better choice.
Second best would be an exception, IMO, to signal: please be explicit about what to do about NANs in the calculation. It helps reduce the needed backtracking when the end result of a calculation turns out to be NAN.
Filtering out NANs should always be an explicit choice to make. Ideally such filtering should happen *before* any calculations get applied. In some cases, it's better to replace NANs with use case specific default values. In others, removing them is the right thing to do.
Note that e.g. SQL defaults to ignoring NULLs in aggregate functions such as AVG(), so there are standard precedents for ignoring NAN values per default as well. And yes, that default can lead to wrong results in reports which are hard to detect.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, Aug 24 2021) >>> Python Projects, Coaching and Support ... https://www.egenix.com/ >>> Python Product Development ... https://consulting.egenix.com/ ________________________________________________________________________
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/L5QB4G... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SSGI4J... Code of Conduct: http://python.org/psf/codeofconduct/
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Aug 26 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/