[Python-Dev] Re: Having Sorted Containers in stdlib?

11 Nov 2021

      Serhiy Storchaka wrote:
...
From my C++ and Java experience, hashtable-based containers are much
more useful than tree-based containers. They are more compact and fast.
In most cases the only reason of using sorted container is supporting
deterministic iteration order, but often it is enough to sort data only
for output.
The concern is not constant speed factor but asymptotic complexity.
Inserting or deleting in a tree-based sorted collection is O(log n),
but lists are O(n): the bisect is O(log n) but moving trailing list elements
is O(n). It's memmove fast but dominates as n gets large.

There are many algorithms that are hard to implement as fast as they
can be without a tree. Something like rolling median would be a good
test case, e.g. "for a list N containing numbers, produce a list M of
length len(N) - 1 where M[i] == statistics.median(N[:i+1])". I think the
best you can do with the standard library and without writing a
balanced binary tree from scratch is O(n²) (if you can beat this I'd like
to know how!). With a tree-based sorted list it's O(n log n).

Dicts and sets cannot maintain an arbitrary order, except that
OrderedDict.move_to_end() very occasionally turns out to be all you
need (and even then, I find it tricky to reason about and apply
correctly in an algorithm: it requires you to maintain an invariant,
where SortedList maintains the sorted invariant for you).
...
There is no such need
of tree-based implementation. It is not needed in Python core, and is
not needed for most users. So it is good to keep it in third-party
libraries. Maintaining it in the stdlib has a cost and the benefit/cost
value would be low.
The situation where this bites my firm the most is in interviews. This is
exactly the situation where we're asking candidates to demonstrate
their CS knowledge and produce an algorithm that has low asymptotic
complexity.

We use various services including HackerRank, which provide a
sandboxed Python environment that cannot use PyPI. There are quite
a few services in this kind of Educational+Challenge space including
some that execute Python in the browser using Brython etc.

My team believes that candidates with a Java background find the
challenge easier than candidates with a Python background, because
they just use NavigableSet and move on. I find that kind of embarrassing.
It requires us to apologise for Python's deficiency and fudge how we
score the question, which makes it hard to know we're providing a level
playing field.

(I would argue that we should not ask those kinds of questions or not
use these sandboxed interview environments but institutional friction
makes that magnitude of change very difficult.)

Putting something in the standard library means that it will, in due
course, appear in HackerRank and other services we use, and we can
expect candidates to be familiar with it and incorporate it in their
solutions.

[Python-Dev] Re: Having Sorted Containers in stdlib?

Daniel Pope