[Python-Dev] Guarantee ordered dict literals in v3.7?

Nick Coghlan ncoghlan at gmail.com
Tue Nov 7 02:28:24 EST 2017


On 7 November 2017 at 16:21, Steven D'Aprano <steve at pearwood.info> wrote:
> On Mon, Nov 06, 2017 at 08:05:07PM -0800, David Mertz wrote:
>> Maybe OrderedDict can be
>> rewritten to use the dict implementation. But the evidence that all
>> implementations will always be fine with this restraint feels poor,
>
> I think you have a different definition of "poor" to me :-)

While I think "poor" is understating the case, I think "excellent"
(which you use later on) is overstating it. My own characterisation
would be "at least arguably good enough".

> Nick has already done a survey of PyPy (which already has insertion-
> order preserving dicts), Jython, VOC, and Batavia, and they don't have
> any problem with this.

For these, my research only showed that their respective platforms
have an order-preserving hashmap implementation available.

What's entirely unclear at this point is how switching wholesale to
that may impact the *performance* of these implementations (both in
terms of speed and memory usage), and how much code churn would be
involved in actually making the change.

Making builtin dict() order-preserving may also still impose an
ongoing complexity cost on these implementations if they end up having
to split "the mapping we use for code execution namespaces" away from
"the mapping we provide as the builtin dict".

(That said, I think at least Jython already makes that distinction - I
believe their string-only namespace dict is a separate type, whereas
CPython plays dynamic optimisation games inside the regular dict
type).

So Barry's suggestion of providing an explicit
"collections.UnorderedDict" as a consistent spelling for "an
associative mapping without any ordering guarantees" is a reasonable
one, even if it's primary usage in CPython ends up being to ensure
algorithms are compatible with collections that don't provide an
inherently stable iteration order, and any associated performance
benefits are mostly seen on other implementations. (As Paul S notes,
such a data type would also serve a pedagogical purpose when teaching
computer science principles)

> IronPython is built on C#, which has order-
> preserving mappings.

I couldn't actually find a clearly suitable existing collection type
in the .NET CLR - the one I linked was just the one commonly
referenced as "good enough for most purposes". It had some constraints
that meant it may not be suitable as a builtin dict type in a Python
implementation (e.g. it looked to have a 32-bit length limit).

> Nuitka is built on C++, and if C++ can't implement
> an order-preserving mapping, there is something terribly wrong with the
> world. Cython (I believe) uses CPython's implementation, as does
> Stackless.

Right, the other C/C++ implementations that also target environments
with at least 128 MiB+ RAM (and typically more) can reasonably be
expected to tolerate similar space/speed trade-offs to those that
CPython itself makes (and that's assuming they aren't just using
CPython's data type implementations in the first place).

> The only well-known implementation that may have trouble with this is
> MicroPython, but it already changes the functionality of a lot of
> builtins and core language features, e.g. it uses a different method
> resolution order (so multiple inheritence won't work right), some
> builtins don't support slicing with three arguments, etc.
>
> I think the evidence is excellent that other implementations shouldn't
> have a problem with this, unless (like MicroPython) they are targetting
> machines with tiny memory resources. µPy runs on the PyBoard, which I
> believe has under 200K of memory. I think we can all forgive µPy if it
> only *approximately* matches Python semantics.

It runs on the ESP8266 as well, and that only has 96 kiB data memory.
This means we're already talking 3-4 orders of magnitude difference in
memory capacity and typical data set sizes between CPython and
MicroPython use cases, and that's only accounting for the *low* end of
CPython's use cases - once you expand out to multi-terabyte data sets
(the very upper end of what a single x86-64 server can handle if you
can afford the RAM for it), then we're talking 9-10 orders of
magnitude between CPython's high end and MicroPython's low end.

So for CPython's target use cases algorithmic efficiency dominates
performance, and we afford to invest extra memory usage and startup
overhead in service to more efficient data access algorithms.
MicroPython's the opposite - you're going to run out of memory for
data storage long before algorithmic inefficiency becomes your biggest
problem, but wasted bookkeeping memory and startup overhead can cause
problems even with small data sets.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list