[Python-ideas] Access to function objects

Sun Aug 7 14:10:46 CEST 2011

On Sun, Aug 7, 2011 at 3:46 AM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Sat, Aug 6, 2011 at 11:36 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> Eric Snow wrote:
>>> Why not bind the called function-object to the frame locals, rather
>>> than the one for which the code object was created, perhaps as
>>> "__function__"?
>>
>> I'm afraid I can't interpret this (which may be my ignorance rather than
>> your fault).
>
> No, the fault is likely mine.  But it seems so clear to me. :)
>
>> The only guess I can make is based on what you say later:
>>
>> "One new implicit name in locals().
>>
>> so I presume you mean that the function should see a local variable (perhaps
>> called "me", or "this"?) that is bound to itself.
>
> One called __function__ (or the like).  A "dunder" name is used to
> indicate its special nature and limit conflict with existing code.

Without thinking too much about this I like it.

> The function that was called would be bound to that name at function
> execution time.  Keep in mind that I am talking about the frame
> locals, not anything stored on the code object nor on the function
> object.  Not to overdramatize it, but it would happen at the beginning
> of every call of every function.  I don't know what that overhead
> would be.

It could be made into a "cell", which is the same way all locals are
normally represented. This is very fast. Further the code for it could
be triggered by the appearance of __function__ (if that's the keyword
we choose) in the function body. I don't really care what happens if
people use locals() -- that's inefficient and outmoded anyway. (Note
that!)

>> Presumably if a function wants to use that same name as a local, nothing bad
>> will happen, since the local assignment will just override the implicit
>> assignment. But what about code that expects to see a nonlocal or global
>> with the same name?

That's why a __dunder__ name is used.

>> What happens when two functions, sharing the same code object, get called
>> from two threads at the same time? Are their locals independent?
>
> I'm afraid I don't know.  I expect that each would get executed in
> separate execution frames, and so have separate frame locals.

The frames are completely independent. They all point to the same code
object and under the proposal they will all point to the same function
object. I see no problems here except self-inflicted, like using
__function__ to hold state that can't be accessed concurrently safely;
note that recursive invocations have the same issue. I see it as a
non-problem.

>> For most uses, standard recursion via the name is good enough, it's only a
>> few corner cases where self-reflection (as I call it) is needed.

Right. If it were expected that people would start writing recursive
calls using __function__ routinely, in situations where a name
reference works, I'd be very unhappy with the new feature. (And if
someone wants to make the argument that recursive calls using
__function__ are actually better in some way I am willing to
filibuster.)

>> And I say
>> that as somebody who does want a way for functions to know themselves. I
>> don't think that use-case is so important that it should be implicitly added
>> to every function, on the off-chance it is needed, rather than explicitly on
>> demand.
>
> For me the use case involves determining what function called my
> function.  Currently you can tell in which execution frame a function
> was called, and thereby which code object, but reliably matching that
> to a function is not so simple.  I recognize that my case is likely
> not a general one.

But it is a nice one. It solves some issues that pdb currently solves
by just using a file/line reference.

>>> To finish things off, bind to every new code object
>>> the function for which it was created, perhaps as "co_func".  That way
>>> you will always know what function object was called and which one the
>>> code object came from originally.
>>
>> What benefit will this give? Have you ever looked at a code object and said,
>> "I need a way of knowing which function this is from?" If so, I'd love to
>> know what problem you were trying to solve at the time!
>
> You caught me! :)  I don't already have a use case for this part.  I
> had only considered that without this you could not determine where a
> code object came from, or if a function had borrowed another's code
> object.  This is certainly only useful in the case that one function
> is using the code object of another, which we have all agreed is not
> that common.  However, with a co_func I felt that all the bases would
> be covered.

Ah, but you can't do that! There are many situations where a single
code object is used to create many different function objects. E.g.
every time you have a nested function. Also the code object is
immutable. This part is all carefully considered and should be left
alone.

>> Code objects don't always get created as part of a function. They can be
>> returned by compile. What should co_func be set to then?
>
> None, since there was no function object created along with the code
> object.  Same with generator expressions.

Just forget this part.

>> Finally, if the function has a reference to the code object, and the code
>> object has a reference to the function, you have a reference cycle. That's
>> not the end of the world now as it used to be, in early Python before the
>> garbage collector was added, but still, there better be a really good
>> use-case to justify it.
>>
>> (Perhaps a weak reference might be more appropriate?)
>
> Good point.
>
> Mostly I am trying to look for an angle that works without a lot of
> trouble.  Can't fault me for trying in my own incoherent way. :)

On the rest of that rejected PEP:

- I'm not actually sure how easy it is to implement the setting of
__function__ when the frame is created. IIRC the frame creation is
rather far removed from the function object, as there are various
cases where there is no function object (class bodies, module-level
code) and in other cases the function is called via a bound method.
Someone should write a working patch to figure out if this is a
problem in practice.

- The primary use case for __function__ to me seems to access function
attributes, but I'm not sure what's wrong with referencing these via
the function name. Maybe it's when there's a method involved, since
then you'd have to write <classname>.<methodname>.<attrname>.

- It seems that the "current class" question has already been solved
for super(). If more is needed I'd be okay with extending the
machinery used by super() so that you can access the magic "current
class" variable explicitly too.

- For "current module" I've encountered a number of use cases, mostly
having to do with wanting to define new names dynamically. Somehow I
have found:

  globals()[x] = y  # Note that x is a variable, not a literal

cumbersome; I'd rather write:

  setattr(__this_module__, x, y)

There are IIRC also some use cases where an API expects a module
object (or at least something whose attributes it can set and/or get)
and passing the current module is clumsy:

  foo(sys.modules[__name__])

On the whole these use cases are all fairly weak though and I would
give it a +0 at best. But rather than a prolonged discussion of the
merits and use cases, I strongly recommend that somebody tries to come
up with a working implementation and we'll strengthen the PEP from
there.

-- 
--Guido van Rossum (python.org/~guido)