[Python-ideas] Re: Revisiting a frozenset display literal

Jan. 16, 2022

      Summary: Further information is provided, which suggests that it may be
best to amend Python so that "frozenset({1, 2, 3})" is the literal for
eval("frozenset({1, 2, 3})").

Steve D'Aprano correctly notes that the bytecode generated by the expression
    x in {1, 2 ,3}
is apparently not optimal. He then argues that introducing a frozenset
literal would allow
   x in f{1, 2, 3} # New syntax, giving a frozenset literal
would allow better bytecode to be generated.

However, the following has the same semantics as "x in {1, 2, 3}" and
perhaps gives optimal bytecode.
...
...
...
import dis
dis.dis("x in (1, 2, 3)")
  1           0 LOAD_NAME                0 (x)
              2 LOAD_CONST               3 ((1, 2, 3))
              4 COMPARE_OP               6 (in)
              6 RETURN_VALUE
For comparison, here's the bytecode Steve correctly notes is apparently not
optimal.
...
...
...
dis.dis("x in {1, 2, 3}")
  1           0 LOAD_NAME                0 (x)
              2 LOAD_CONST               3 (frozenset({1, 2, 3}))
              4 COMPARE_OP               6 (in)
              6 RETURN_VALUE
Steve states that "x in {1, 2, 3}" when executed calls "frozenset({1, 2,
3})", and in particular looks up "frozenset" in builtins and literals.

I can see why he says that, but I've done an experiment that suggests
otherwise.
...
...
...
bc = compile("x in {1, 2, 3}", "filename", "eval")
eval(bc, dict(x=1))
True
eval(bc, dict(x=1, frozenset="dne"))
True
I suspect that if you look up in the C-source for Python, you'll find that
dis.dis ends up using
    frozenset({1, 2, 3})
as the literal for representing the result of evaluating frozenset({1, 2,
3}).

The following is evidence for this hypothesis:
...
...
...
frozenset({1, 2, 3})
frozenset({1, 2, 3})
set({1, 2, 3})
{1, 2, 3}
set([1, 2, 3])
{1, 2, 3}
set([])
set()
To conclude, I find it plausible that:

1. The bytecode generated by "x in {1, 2, 3}" is already optimal.
2. Python already uses "frozenset({1, 2, 3})" as the literal representation
of a frozenset.

Steve in his original post mentioned the issue
https://bugs.python.org/issue46393, authored by Terry Reedy. Steve rightly
comments on that issue that "may have been shadowed, or builtins
monkey-patched, so we cannot know what frozenset({1, 2, 3}) will return
until runtime."

Steve's quite right about this shadowing problem. In light of my plausible
conclusions I suggest his goal of a frozenset literal might be better
achieved by making 'frozenset' a keyword, much as None and True and False
are already keywords.
...
...
...
True = False
  File "<stdin>", line 1
SyntaxError: can't assign to keyword
Once this is done we can then use
   frozenset({1, 2, 3})
as the literal for a frozenset, not only in dis.dis and repr and elsewhere,
but also in source code.

As a rough suggestion, something like
    from __future__ import literal_constructors_as_keywords
would prevent monkey-patching of set, frozenset, int and so forth (just as
True cannot be monkeypatched).

I thank Steve for bringing this interesting question to our attention, for
his earlier work on the issue, and for sharing his current thoughts on this
matter. It's also worth looking at the message for Gregory Smith that Steve
referenced in his original post.
https://mail.python.org/pipermail/python-ideas/2018-July/051902.html

Gregory wrote: frozenset is not the only base type that lacks a literals
leading to loading values into these types involving creation of an
intermediate throwaway object: bytearray.  bytearray(b'short lived bytes
object')

I hope this helps.
-- 
Jonathan

[Python-ideas] Re: Revisiting a frozenset display literal

Jonathan Fine