[New-bugs-announce] [issue36095] Better NaN sorting.

Brandt Bucher report at bugs.python.org
Sat Feb 23 14:39:17 EST 2019


New submission from Brandt Bucher <brandtbucher at gmail.com>:

Sorting sequences containing NaN values produces an incompletely sorted result. Further, because of the complexity of the timsort, this incomplete sort often silently produces unintuitive, unstable-seeming results that are extremely sensitive to the ordering of the inputs:

>>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0])
[1, 2, 2.0, 2.0, 3, nan, 2]
>>> sorted(reversed([3, 1, 2, float('nan'), 2.0, 2, 2.0]))
[1, 2.0, 2, 2.0, nan, 2, 3]

The patch I have provided addresses these issues, including for lists containing nested lists/tuples with NaN values. Specifically, it stably sorts NaNs to the end of the list with no changes to the timsort itself (just the element-wise comparison functions):

>>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0])
[1, 2, 2.0, 2, 2.0, 3, nan]
>>> sorted([[3], [1], [2], [float('nan')], [2.0], [2], [2.0]])
[[1], [2], [2.0], [2], [2.0], [3], [nan]]

It also includes a new regression test for this behavior.

Some other benefits to this patch:

* These changes generally result in a sorting performance improvement across data types. The largest increases here are for nested lists, since we add a new unsafe_list_compare function. Other speed increases are due to safe_object_compare's delegation to unsafe comparison functions for objects of the same type. Specifically, the speed impact (positive is faster, negative is slower) is between:

    * -3% and +3% (10 elements, no PGO)
    * 0% and +4% (10 elements, PGO)
    * 0% and +9% (1000 elements, no PGO)
    * -1% and +9% (1000 elements, PGO)

* The current weird NaN-sorting behavior is not documented, so this is not a breaking change.

* IEEE754 compliance is maintained. The result is still a stable (arguably, more stable), nondecreasing ordering of the original list.

----------
components: Interpreter Core
messages: 336401
nosy: brandtbucher
priority: normal
severity: normal
status: open
title: Better NaN sorting.
type: behavior
versions: Python 3.8

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36095>
_______________________________________


More information about the New-bugs-announce mailing list