[Python-3000] Dict literal bytecode

Alexander Belopolsky alexander.belopolsky at gmail.com
Tue Mar 25 19:35:14 CET 2008


Python 3.0 set and dict literals produce very different bytecode:

>>> dis(lambda:{1,2,3})
  1           0 LOAD_CONST               0 (1)
              3 LOAD_CONST               1 (2)
              6 LOAD_CONST               2 (3)
              9 BUILD_SET                3
             12 RETURN_VALUE

>>> dis(lambda:{1:1,2:2,3:3})
  1           0 BUILD_MAP                3
              3 LOAD_CONST               0 (1)
              6 LOAD_CONST               0 (1)
              9 STORE_MAP
             10 LOAD_CONST               1 (2)
             13 LOAD_CONST               1 (2)
             16 STORE_MAP
             17 LOAD_CONST               2 (3)
             20 LOAD_CONST               2 (3)
             23 STORE_MAP
             24 RETURN_VALUE

I have recently proposed a patch that would make dict bytecode similar
to set.  (See http://bugs.python.org/issue2197).  I have later
realized that a similar idea has been proposed before
(http://bugs.python.org/issue402227) and rejected because it slowed
down pybench dict creation benchmark.  However, in my tests (on Linux
and Mac OS X) my patch gives a slight improvement 1-2% on the same
benchmark.

Two other issues have been raised with respect to my proposal:

1. The patch "changes semantics by executing all of the stores after the inputs
are generated."

2. All of the pending item pairs get built-up on an ever-growing stack
instead of
being consumed pairwise.

I have promised to think some more about these issues and here are my thoughts:

1. The only case I can think of when the change of semantics would be
noticeable is when keys have custom comparison/hash methods with
side-effects.  I doubt this can be an issue in real life and I don't
think python makes any guarantees in this respect.  If, however, it is
desirable to have this order specified (note that we are not talking
about the order of insertions, which is preserved, but whether keys
are created between insertions or before the first insertion) than it
is hard to justify the difference between set and dict literals
semantics.

2.  If I understand correctly how the code works, frame stack does not
grow during code execution.  The required stack size is precomputed
during calculation and the necessary array is allocated during
execution.  So the growing stack should not slow things down.  With
respect to memory usage, note that the frame stack allocation is
transient  while bytecode is loaded permanently.  In some applications
bytecode size reduction may be more important (in terms of peak memory
usage) than stack size increase for some frames.

There are further optimizations that I would like to try once this
issue is resolved.  (These optimizations are not strictly speaking
dependent on the acceptance of my patch, but a unification of dict and
set literals would make the coding easier.)   I understand that the
idea of making set literal produce a frozenset was accepted
<http://mail.python.org/pipermail/python-3000/2008-January/011814.html>.
 Once that is implemented, it will be natural to fold constant set
literals in the peephole optimizer.   While a similar optimization
will not be immediately available for (mutable) dictionaries, the
following approach is conceivable: for the dicts with constant keys,
push a frozen set of keys constant on the stack preceded by the values
in the order that they will be stored in the hash table.  Indicate
this situation with a -1 argument to BUILD_MAP (the size of the
dictionary will be inferred from the size of the frozenset at the top
of the stack).  Now as long as the hash table implementation is
compatible between dicts and sets, dictionary creation will be a
simple copy operation with all the hard work done during compilation.

Do you think any of these ideas are worth pursuing?


More information about the Python-3000 mailing list