Should the sum of Series of all NaN return 0 or NaN ?
For those not aware but potentially interested, want to point out that there is a (renewed) discussion going on about the question in the title, see https://github.com/pandas-dev/pandas/issues/9422. As a summary of what this is about: The question is what the sum of a Series of all NaNs should return (which is equivalent to an empty Series after skipping the NaNs): NaN or 0? In [1]: s = Series([np.nan]) In [2]: s.sum(skipna=True) # skipping NaNs is the default Out[2]: nan or 0 <---- DISCUSSION POINT In [3]: s.sum(skipna=False) Out[3]: nan The reason this is a discussion point has the following cause: the internal nansum implementation of pandas returns NaN. But, when bottleneck is installed, pandas will use bottlenecks implementation of nansum, which returns 0 (for the versions >= 1.0). Bottleneck changed the behaviour from returning NaN to returning 0 to model it after numpy's nansum function. This has the very annoying consequence that depending on whether bottleneck is installed or not (which is only an optional dependency), you get a different behaviour. This inconsistency is bad, and should be solved, and so we have to choose between both behaviours. I am not going to list pro's and con's here, for that you can read the comments in the github issues :-) (you can start here with the comments of today: https://github.com/pandas-dev/pandas/issues/9422#issuecomment-283502828) Regards, Joris
participants (1)
-
Joris Van den Bossche