[Python-Dev] Guarantee ordered dict literals in v3.7?

Steven D'Aprano steve at pearwood.info
Wed Dec 20 05:31:20 EST 2017


On Mon, Dec 18, 2017 at 08:49:54PM -0800, Nathaniel Smith wrote:
> On Mon, Dec 18, 2017 at 7:58 PM, Steven D'Aprano <steve at pearwood.info> wrote:

> > I have a script which today prints data like so:
[...]
> To make sure I understand, do you actually have a script like this, or
> is this hypothetical?

The details are much simplified, but basically, and my users probably 
won't literally yell at me, but yes I do.

But does it matter? The thing about backwards-compatibility guarantees 
is that we have to proceed as if somebody does have such a script. We 
don't know who, we don't know why, but we have to assume that they are 
relying on whatever guarantees we've given and will be greatly 
inconvenienced by any change without sufficient notice.


> > Now, maybe that's my own damn fault for using
> > pprint
[...]
> > so I think I can be excused having relied on that feature.
> 
> No need to get aggro -- I asked a question, it wasn't a personal attack.

I didn't interpret it as an attack. Sorry for any confusion, I was 
trying to be funny -- at least, it sounded funny in my own head.


> At a high-level, pprint's job is to "pretty-print arbitray Python data
> structures in a form which can be used as input to the interpreter"
> (quoting the first sentence of its documentation), i.e., like repr()

The *high* level purpose of pprint is to *pretty-print* values, like the 
name says. If all we wanted was something that outputs an eval()'able 
representation, we already had that: repr().

But even that requirement that output can be used as input to the 
interpreter is a non-core promise. There are plenty of exceptions: 
recursive data structures, functions, any object with the default repr, 
etc. Even when it works, the guarantee is quite weak. For instance, even 
the object type is not preserved:

py> class MyDict(dict):
...     pass
...
py> d = MyDict()
py> x = eval(repr(d))
py> assert d == x
py> assert type(d) == type(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError


So the "promise" that eval(repr(obj)) will round-trip needs to be 
understood as being one of those Nice To Have non-core promises, not an 
actual guaranteed feature. (The bold print giveth, and the fine print 
taketh away.)

So the fact that the output of pprint doesn't preserve the order of the 
dict won't be breaking any documented language guarantees. (It is 
probably worth documenting explicitly though, rather than just letting 
it be implied by the sorted keys guarantee.)


> it's fundamentally intended as a debugging tool that's supposed to
> match how Python works, not any particular externally imposed output
> format.

The point of pprint is not merely to duplicate what repr() already does, 
but to output an aesthetically pleasing view of the data structure. 
There is no reason to think that is only for the purposes of debugging. 
pprint is listed in the docs under Data Types, not Debugging:

https://docs.python.org/3/library/datatypes.html

https://docs.python.org/3/library/debug.html


> Now, how Python works has changed. Previously dict order was
> arbitrary, so picking the arbitrary order that happened to be sorted
> was a nice convenience.

Beware of promising a feature for convenience, because people will come 
to rely on it.

In any case, lexicographic (the default sorting) order is in some ways 
the very opposite of "arbitrary order".


> Now, dict order isn't arbitrary, 

No, we can't say that. Dicts *preserve insertion order*, that is all. 
There is no requirement that the insertion order be meaningful or 
significant in any way: it may be completely arbitrary. If I build 
a mapping of (say) product to price:

    d = {'hammer': 5, 'screwdriver': 3, 'ladder': 116}

the order the items are inserted is arbitrary, probably representing the 
historical accident of when they were added to the database/catalog or 
when I thought of them while typing in the dict.

The most we can say is that for *some* cases, dict order *may* be meaningful.

We're under no obligation to break backwards-compatibility guarantees in 
order for pretty printing to reflect a feature of dicts which may or may 
not be of any significance to the user.


> and sorting dicts both obscures the actual structure of the 
> Python objects,

You can't see the actual structure of Python objects via pprint. For 
example, you can't see whether the dict is a split table (shared keys) 
or combined table. You can only see the parts of the public interface 
which the repr, or pprint, chooses to show.

That's always been the case so nothing changes here.

If pprint were new to 3.7, I daresay there would be a good argument 
to have it display keys in insertion order, but given backwards 
compatibility, that's not tenable without either an opt-in switch, or a 
period of deprecation.


> and also breaks round-tripping through pprint.

Round-tripping need not promise to preserve order, since dicts 
don't care about order for the purposes of equality.

Round-tripping already is a lossy operation: 

- object identity is always lost (apart from a few cached objects 
  like small ints and singletons like None);

- in some cases, the type of objects can be lost;

- any attribute of the object which is not both reflected in its 
  repr and set by its constructor will be lost;

  (e.g. x = something(); x.extra_attribute = 'spam')

- many objects don't round-trip at all, e.g. functions and
  recursive data structures.

So the failure of pprint to preserve such insertion order by 
default is just one more example.


> Given that pprint's
> overarching documented contract of "represent Python objects" now
> conflicts with the more-specific documented contract of "sort dict
> keys", something has to give.

I believe the overarching contract is to pretty print. Anything else is 
a Nice To Have.


[...]
> But I would be in favor of adding a kwarg to let people opt-in to the
> old behavior like:
> 
>     from pprint import PrettyPrinter
>     pprint = PrettyPrinter(sortdict=True).pprint

It would have to be the other way: opt-out of the current behaviour.


-- 
Steve


More information about the Python-Dev mailing list