[Python-3000] Dict literal bytecode

Tue Mar 25 20:32:43 CET 2008

On Tue, Mar 25, 2008 at 12:35 PM, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:
> Python 3.0 set and dict literals produce very different bytecode:
>
>  >>> dis(lambda:{1,2,3})
>   1           0 LOAD_CONST               0 (1)
>               3 LOAD_CONST               1 (2)
>               6 LOAD_CONST               2 (3)
>               9 BUILD_SET                3
>              12 RETURN_VALUE
>
>  >>> dis(lambda:{1:1,2:2,3:3})
>   1           0 BUILD_MAP                3
>               3 LOAD_CONST               0 (1)
>               6 LOAD_CONST               0 (1)
>               9 STORE_MAP
>              10 LOAD_CONST               1 (2)
>              13 LOAD_CONST               1 (2)
>              16 STORE_MAP
>              17 LOAD_CONST               2 (3)
>              20 LOAD_CONST               2 (3)
>              23 STORE_MAP
>              24 RETURN_VALUE
>
>  I have recently proposed a patch that would make dict bytecode similar
>  to set.  (See http://bugs.python.org/issue2197).  I have later
>  realized that a similar idea has been proposed before
>  (http://bugs.python.org/issue402227) and rejected because it slowed
>  down pybench dict creation benchmark.  However, in my tests (on Linux
>  and Mac OS X) my patch gives a slight improvement 1-2% on the same
>  benchmark.
>
>  Two other issues have been raised with respect to my proposal:
>
>  1. The patch "changes semantics by executing all of the stores after the inputs
>  are generated."
>
>  2. All of the pending item pairs get built-up on an ever-growing stack
>  instead of
>  being consumed pairwise.
>
>  I have promised to think some more about these issues and here are my thoughts:
>
>  1. The only case I can think of when the change of semantics would be
>  noticeable is when keys have custom comparison/hash methods with
>  side-effects.  I doubt this can be an issue in real life and I don't
>  think python makes any guarantees in this respect.  If, however, it is
>  desirable to have this order specified (note that we are not talking
>  about the order of insertions, which is preserved, but whether keys
>  are created between insertions or before the first insertion) than it
>  is hard to justify the difference between set and dict literals
>  semantics.
>
>  2.  If I understand correctly how the code works, frame stack does not
>  grow during code execution.  The required stack size is precomputed
>  during calculation and the necessary array is allocated during
>  execution.  So the growing stack should not slow things down.  With
>  respect to memory usage, note that the frame stack allocation is
>  transient  while bytecode is loaded permanently.  In some applications
>  bytecode size reduction may be more important (in terms of peak memory
>  usage) than stack size increase for some frames.

I worry that there might be generated code using disgustingly large
literals.  Does batch creation still work for that?

>  There are further optimizations that I would like to try once this
>  issue is resolved.  (These optimizations are not strictly speaking
>  dependent on the acceptance of my patch, but a unification of dict and
>  set literals would make the coding easier.)   I understand that the
>  idea of making set literal produce a frozenset was accepted
>  <http://mail.python.org/pipermail/python-3000/2008-January/011814.html>.
>   Once that is implemented, it will be natural to fold constant set
>  literals in the peephole optimizer.   While a similar optimization
>  will not be immediately available for (mutable) dictionaries, the
>  following approach is conceivable: for the dicts with constant keys,
>  push a frozen set of keys constant on the stack preceded by the values
>  in the order that they will be stored in the hash table.  Indicate
>  this situation with a -1 argument to BUILD_MAP (the size of the
>  dictionary will be inferred from the size of the frozenset at the top
>  of the stack).  Now as long as the hash table implementation is
>  compatible between dicts and sets, dictionary creation will be a
>  simple copy operation with all the hard work done during compilation.

Guido's support for frozenset was later withdrawn:
http://mail.python.org/pipermail/python-3000/2008-January/011868.html

In certain contexts, such as "if i in {1, 2, 3}:", you can reuse the
literal regardless of whether it's a set or a frozenset.  I don't know
if anybody has begun implementing that though.

Playing around a bit, I've found that "_x = {1, 2, 3}; x = set(_x)"
(assuming _x were precomputed) is faster for larger literals, with the
break even point at about 10.  Perhaps you could investigate that as
well?

-- 
Adam Olsen, aka Rhamphoryncus