<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">On Mon, Jan 7, 2019 at 6:50 AM Steven D'Aprano <<a href="mailto:steve@pearwood.info">steve@pearwood.info</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> I'll provide a suggested batch on the bug. It will simply be a wholly<br>
> different implementation of median and friends.<br>
<br>
I ask for a documentation patch and you start talking about a whole new <br>
implementation. Huh.<br>A new implementation with precisely the same behaviour is a waste of <br>
time, so I presume you're planning to change the behaviour. How about if <br>
you start off by explaining what the new semantics are?<br></blockquote><div><br></div><div>I think it would be counter-productive to document the bug (as something other than a bug). Picking what is a completely arbitrary element in face of a non-total order can never be "correct" behavior, and is never worth preserving for compatibility. I think the use of statistics.median against partially ordered elements is simply rare enough that no one tripped against it, or at least no one reported it before.</div><div><br></div><div>Notice that the code itself pretty much recognizes the bug in this comment:</div><div><br></div></div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div dir="ltr"><div class="gmail_quote"><div><font face="monospace, monospace"># FIXME: investigate ways to calculate medians without sorting? Quickselect?</font></div></div></div></blockquote><div dir="ltr"><div class="gmail_quote"><div> </div><div>So it seems like the original author knew the implementation was wrong. But you're right, the new behavior needs to be decided. Propagating NaNs is reasonable. Filtering out NaN's is reasonable. Those are the default behaviors of NumPy and Pandas, respectively:</div><div><br></div></div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div dir="ltr"><div class="gmail_quote"><div><font face="monospace, monospace">np.median([1,2,3,nan]) # -> nan<br></font></div><div><font face="monospace, monospace">pd.Series([1,2,3,nan]).median() # -> 2.0</font><br></div><div><br></div></div></div></blockquote><div dir="ltr"><div class="gmail_quote"><div>(Yes, of course there are ways in each to get the other behavior). Other non-Python tools similarly suggest one of those behaviors, but really nothing else.</div><div><br></div><div>So yeah, what I was suggesting as a patch was an implementation that had PROPAGATE and IGNORE semantics. I don't have a real opinion about which should be the default, but the current behavior should simply not exist at all. As I think about it, warnings and exceptions are really too complex an API for this module. It's not hard to manually check for NaNs and generate those in your own code.</div><div><br></div></div>-- <br><div dir="ltr" class="gmail_signature">Keeping medicines from the bloodstreams of the sick; food <br>from the bellies of the hungry; books from the hands of the <br>uneducated; technology from the underdeveloped; and putting <br>advocates of freedom in prisons. Intellectual property is<br>to the 21st century what the slave trade was to the 16th.<br></div></div></div></div></div>