On 4/15/21 9:24 PM, Inada Naoki wrote:

Unlike simple function case, PEP 649 creates function object instead
of code object for __co_annotation__ of methods.
It cause this overhead.  Can we avoid creating functions for each annotation?

As the implementation of PEP 649 currently stands, there are two reasons why the compiler might pre-bind the __co_annotations__ code object to a function, instead of simply storing the code object:

If the annotations refer to a closure ("freevars" is nonzero), or
If the annotations possibly refer to a class variable (the annotations code object contains either LOAD_NAME or LOAD_CLASSDEREF).

If the annotations refer to a closure, then the code object also needs to be bound with the "closure" tuple. If the annotations possibly refer to a class variable, then the code object also needs to be bound with the current "f_locals" dict. (Both could be true.)

Unfortunately, when generating annotations on a method, references to builtins (e.g. "int", "str") seem to generate LOAD_NAME instructions instead of LOAD_GLOBAL. Which means pre-binding the function happens pretty often for methods. I believe in your benchmark it will happen every time. There's a lot of code, and a lot of runtime data structures, inside compile.c and symtable.c behind the compiler's decision about whether something is NAME vs GLOBAL vs DEREF etc, and I wasn't comfortable with seeing if I could fix it.

Anyway I assume it wasn't "fixable". The compiler would presumably already prefer to generate LOAD_GLOBAL vs LOAD_NAME, because LOAD_GLOBAL would be cheaper every time for a global or builtin. The fact that it already doesn't do so implies that it can't.

At the moment I have only one idea for a possible optimization, as follows. Instead of binding the function object immediately, it might be cheaper to write the needed values into a tuple, then only actually bind the function object on demand (like normal).

I haven't tried this because I assumed the difference at runtime would be negligible. On one hand, you're creating a function object; on the other you're creating a tuple. Either way you're creating an object at runtime, and I assumed that bound functions weren't that much more expensive than tuples. Of course I could be very wrong about that.

The other thing is, it would be a lot of work to even try the experiment. Also, it's an optimization, and I was more concerned with correctness... and getting it done and getting this discussion underway.

What follows are my specific thoughts about how to implement this optimization.

In this scenario, the logic in the compiler that generates the code object would change to something like this:

has_closure = co.co_freevars != 0
has_load_name = co.co_code does not contain LOAD_NAME or LOAD_CLASSDEREF bytecodes
if not (has_closure or has_load_name):
    co_ann = co
elif has_closure and (not has_load_name):
    co_ann = (co, freevars)
elif (not has_closure) and has_load_name:
    co_ann = (co, f_locals)
else:
    co_ann = (co, freevars, f_locals)
setattr(o, "__co_annotations__", co_ann)

(The compiler would have to generate instructions creating the tuple and setting its members, then storing the resulting object on the object with the annotations.)

Sadly, we can't pre-create this "co_ann" tuple as a constant and store it in the .pyc file, because the whole point of the tuple is to contain one or more objects only created at runtime.

The code implementing __co_annotations__ in the three objects (function, class, module) would examine the object it got. If it was a code object, it would bind it; if it was a tuple, it would unpack the tuple and use the values based on their type:

// co_ann = internal storage for __co_annotations__
if isinstance(co_ann, FunctionType) or (co_ann == None):
    return co_ann
co, freevars, locals = None
if isinstance(co_ann, CodeType):
    co = co_ann
else:
    assert isinstance(co_ann, tuple)
    assert 1 <= len(co_ann) <= 3
    for o in co_ann:
      if isinstance(o, CodeObject):
          assert not co
            co = o
      elif isinstance(o, tuple):
          assert not freevars
            freevars = o
      elif isinstance(o, dict):
          assert not locals
            locals = o
      else:
          raise ValueError(f"illegal value in co_annotations tuple: {o!r}")
co_ann = make_function(co, freevars=freevars, locals=locals)
return co_ann

If you experiment with this approach, I'd be glad to answer questions about it, either here or on Github, etc.

Cheers,

/arry