[pypy-dev] the new shining Builtinrefactor branch :-)

Wed Sep 17 22:51:29 CEST 2003

Hello everybody !

The 'builtinrefactor' branch that Holger and Armin(*) have been working on is
finally stable again. We are merging it back with the trunk. (This only
affects the src/pypy subdirectory.) Take yourself a cup of tea and read on :-)

(*) this is a join e-mail written in a common 'screen' session :-)

The main change is a rewrite of the interactions between interpreter-
and app-level code with the following design goals:

- allow to freely "mix" app-level and interp-level code in one file
  (thus no need to have xxx_app.py files anymore)

- implement functions and modules uniformly for all object spaces 

- provide mechanisms (interpreter/gateway.py) for transparently 

    - calling app-level code from interp-level code 

    - calling interp-level code from app-level code 

    - accessing interp-level defined attributes from app-level 

- argument parsing is now at interpreter level to once and for all get 
  rid of bootstrapping and debugging nightmares :-)

- reorganized the Code and Frame classes and introduced subclasses of these
  to reduce redunant code dealing with those. 

Here is a more detailled description of the changes.

Application-Level and Interpreter-Level Interaction
---------------------------------------------------

Part one: invoke app-level code from interpreter level
------------------------------------------------------

We no longer need 'xxx_app.py' files for helpers for 'xxx.py'. An app-level
helper function can be written in-line in the source. See for example the
function normalize_exception() in interpreter/pyframe.py:

def app_normalize_exception(etype, evalue):
    ...plain Python app-level code...
normalize_exception = gateway.app2interp(app_normalize_exception)

This makes 'normalize_exception' callable from interpreter-level as if it had
been defined at interpreter-level with the following signature:

def normalize_exception(space, w_etype, w_evalue):
    ...

App-level helpers can also be used as methods. In pyopcode.py,
app_prepare_exec() is the app-level definition of prepare_exec(), a method of
the new class PyInterpFrame (more about it below).

All these helpers can be called normally from the rest of the interpreter
code. Global functions must be called with an extra first argument, 'space'.
For methods the space is read from 'self.space'. All other arguments must be
wrapped objects, and so is the result.

If you have many 'app_*' functions you can register them "en masse" via a call
to

    gateway.importall(globals())   # app_xxx() -> xxx()

For other examples, see objspace/std/dicttype.py, which contains the code that
used to be in the separae dictobject_app.py. Note how all the functions are
defined with app_ prefixes, and then gateway.importall() is used to make the
non-app_ interpreter-level-callable gateways, and finally register_all()
registers the latter into the multimethod tables.

Part two: make interpreter-level objects accessible from app-level 
------------------------------------------------------------------

Conversely, 'gateway' contains code to allow interpreter-level functions to be
visible from app-level. This can be done manually with gateway.interp2app(),
but most of the time it is done for you by the ExtModule base class. See
module/builtin.py:

class __builtin__(ExtModule):
    ...

Although such extension-modules are defined as classes there should only be
one instance per object space.  We are not only reusing the class-statement to
provide a regular way of defining modules that contain code and attributes
defined both at app-level and interp-level. Actually we need them to be a
class because there might be several object spaces alive around there. 

Now the new module/builtin.py contains interp-level code like globals(),
locals(), __import__()..., and app-level code like app_execfile(),
app_range()...  The ExtModule parent class will make sure all these appear on
the instance, both at app-level and at interp-level. The app-level will see an
ExtModule-instance as a plain module. It is better to start with the
interpreter-level view to see how this works. 

You can call all these methods from *interpreter level*; for example, in
'def __import__(self, ...)' we do:

   self.execfile(space.wrap(filename), w_globals, w_locals)

but execfile() is actually defined like so:

   def app_execfile(self, filename, glob=None, loc=None):
	...

and because this is an app-level definition all the arguments are of course
wrapped (we are executing app_execfile() at app-level and the body of
app_execfile refers to the parameters). Please note that app_* functions must
always be called with wrapped arguments even though you don't see a 'w_'
prefix. 

Of course you can also call 'execfile' from outside the instance methods, e.g. 
you can do

     space.builtin.execfile(space.wrap('somefile.py'), w_globals, w_locals)

One more thing is interesting to discuss here. Methods like app_execfile()
have a 'self' argument but what does it mean in the context of a module?  The
bottom line is that this provides an explicit way to access objects in the
same module (instance):

    class somemodule(ExtModule):
        def app_y(self):
            ...

        def app_x(self):
            r = self.y()

Calling 'y()' directly wouldn't usually work because this would try to access
the globals of the CPython-module where 'somemodule' is defined.  This is
unrelated to the "globals namespace" of the app-level module, which is (behind
the scene) implemented as the content of the 'self' instance.  So app_x()
above is in all respects a method of a class.  The important thing to remember
here is that app-level modules are implemented as class-instances at
interpreter-level -- and the usual scoping rules apply. 

>From the user's point of view (i.e. app-level outside the definition of the
module), an expression like 'sys.displayhook' actually gets you a *bound
method* of the 'sys' instance. This is true even when accessing 'displayhook'
with other syntaxes like 'from sys import displayhook'. This is also true for
the __builtin__ module, so e.g. when you type 'len' into the pypy interpreter
you see

    <pypy.interpreter.function.Method object at 0x4032c424>

A bit weird, but it works as expected and we could later change the
representation string :-)

    >>>> import sys
    >>>> sys.displayhook
    <pypy.interpreter.function.Method object at 0xe44f7cc>
    >>>> def f(): 
    ....   pass
    .... 
    >>>> sys.f = f
    >>>> sys.f
    <pypy.interpreter.function.Function object at 0xe4ed3cc>
    >>>> 

The above should start to make sense if you really think of 'sys' as an
instance :-) And this brings us to the next change.

Functions (and friends) and Modules  moved off objectspaces
-----------------------------------------------------------

Function, Method, Generator, Module (and probably more) classes are now part
of the interpreter; an object space is no longer responsible for providing
them. It makes sense because these classes are straightforward structures
anyway, and almost only created and used by the interpreter (like code
objects, who where already in an interpreter-level class of their own). See
interpreter/function.py and interpreter/module.py.

An object space now needs to callback into interpreter-level objects to carry
out operations on them. This allows the interpreter-level to control its own
internal classes.  For example in interpreter/pyframe.py the following method
controls visibility of attributes of a Frame instance:

    def pypy_getattr(self, w_attr):
        attr = self.space.unwrap(w_attr)
        if attr == 'f_locals':   return self.w_locals
        if attr == 'f_globals':  return self.w_globals
        if attr == 'f_builtins': return self.w_builtins
        if attr == 'f_code':     return self.space.wrap(self.code)
        raise OperationError(self.space.w_AttributeError, w_attr)

(For reading attributes we'll probably design some better interface later :-)

Note the preliminary interface to tell the object space how the
interpreter-level objects should react to operations: if an interpreter-level
class defines methods like e.g. pypy_getattr(), pypy_call(), pypy_iter(), etc
then every objectspace is required to call those methods when it encounters a
wrap()ed interpreter-level object for the getattr(), call(), iter() operations
respectively.  When the object space must wrap() one of these objects it does
so in a special structure (for stdobjspace it is CPythonObject) which knows
that it should look for these pypy_* method names.

For example, Function objects have a pypy_call() method that is called
whenever the space.call() operation is issued with a wrapped Function as first
argument. At some point all these wrappable classes should have a
pypy_getattr() as well and we might eventually allow only subclasses of
baseobjspace.Wrappable to be wrap()ed. 

The new interface to wrap()&co that we are discussing in pypy-dev is not
implemented yet, but essentially the one remaining usage for wrap() would be
exactly that: make a "black-box" proxy that just calls back to the pypy_*
methods. It is clearly better to require that an objspace honours the 'pypy_*'
methods instead of requiring that every objectspace implements modules and
functions on its own. 

Code and Frame classes reorganized
----------------------------------

The classes involved in code execution have been reorganized a bit. There are
now two abstract base classes: Code and Frame, plus a few concrete classes:
Function, Method, and Gateway.

A Function object, like CPython's, is essentially a container for a code
object with references to a globals dict and default arguments, and possibly
other closure stuff. As it is already the case since the Gothenburg sprint, in
PyPy there is only one Function class for builtin and user-defined functions,
the difference being in the Code object. A Method object is just the same as
in CPython. And Gateway objects are what app2interp() and interp2app() return:
essentially a Function that isn't bound to any particular object space yet. It
isn't supposed to show up at app-level: whenever it gets associated with a
space, it becomes a Function.

A Code object is some structure that knows its 'signature' (what arguments it
expects), and also knows how to build a Frame to run itself in. A Frame
represents the execution of a Code. It has an abstract run() method, and a few
methods to get and set its locals as a dictionary or as a plain list ("fast
locals"). To see how this fits together, see the method call() of Function
objects in function.py:

    def call(self, w_args, w_kwds=None):
        scope_w = self.parse_args(w_args, w_kwds)
        frame = self.func_code.create_frame(self.space, self.w_globals,
                                            self.closure)
        frame.setfastscope(scope_w)
        return frame.run()

Argument parsing is done in parse_args() according to
self.func_code.signature(); then the Code is asked to create a Frame; the
Frame is sent the decoded list of arguments to initialize its locals; and then
it is run.

The other interface to running code is the method exec_code() of Code in
eval.py, which creates a frame and sets its locals as a dictionary before
running it:

    def exec_code(self, space, w_globals, w_locals):
        "Implements the 'exec' statement."
        frame = self.create_frame(space, w_globals)
        frame.setdictscope(w_locals)
        return frame.run()

PyCode is a subclass of Code with all the co_xxx attributes of regular CPython
code objects. Quite expectedly, it runs in a PyFrame, which is like a regular
CPython frame. Unexpectedly, however, there are now several subclasses of
PyFrame, depending on what particular opcodes we need.

* The regular one is PyInterpFrame in pyopcode.py. All opcode
  definitions are now methods of this class instead of being global
  functions.

* There is also PyNestedScopeFrame in nestedscope.py, which adds a few
  more opcodes, namely the ones related to nested scopes.

* And there is GeneratorFrame in generator.py which overrides
  RETURN_VALUE and defines YIELD_VALUE to provide generator-like behavior.

The PyCode class knows which frame to create by inspecting its co_xxx
attributes (co_flags tells if we are a generator, and co_cellvars/co_freevars
are empty unless we need nested scopes).  The nice thing is that apart from
these checks, everything about nested scopes is in nestedscope.py, and
everything about generators is in generator.py. Contrast this with CPython's
huge ceval.c file which mixes all these features throughout the code.

Later on we plan to extend that model even more to allow more flexibility,
like custom opcodes. 

There are other, simpler subclasses of Code with their corresponding
subclasses of Frame:

* in gateway.py, BuiltinCode and BuiltinFrame just call back to an
  interpreter-level function. This is the code object found in Functions
  created out of Gateways built by gateway.interp2app(). Such Functions
  are the equivalent of CPython's built-in functions. Note that
  BuiltinCode, like all Code subclasses, exposes a signature for Function
  to be able to decode arguments. Thus, when a built-in function is 
  called: 

  (1) the arguments are parsed in the normal way (e.g. passing
      arguments by keyword is always allowed) and turned into a "fast locals"
      list;

  (2) a BuiltinFrame is created and assigned the "fast locals"; 

  (3) the BuiltinFrame.run() method is called, which just calls the
      interpreter-level function using the "fast locals" list as arguments.

* in objspace/std/typeobject.py, MultimethodCode and a few XxxMmFrame
  classes implement multimethod-calls. This is the type of the Code
  objects that you would see by typing '[].append.im_func.func_code'
  (expect that 'im_func' isn't implemented right now).

Enjoy !

Armin & Holger
Holger & Armin
whatever