[pypy-svn] r41362 - in pypy/dist/pypy: doc jit/tl

arigo at codespeak.net arigo at codespeak.net
Mon Mar 26 16:14:51 CEST 2007


Author: arigo
Date: Mon Mar 26 16:14:50 2007
New Revision: 41362

Modified:
   pypy/dist/pypy/doc/jit.txt
   pypy/dist/pypy/jit/tl/targettiny1.py
   pypy/dist/pypy/jit/tl/targettiny2.py
   pypy/dist/pypy/jit/tl/tiny2.py
Log:
Mostly finished jit.txt.  Added long comments in tiny2.py.


Modified: pypy/dist/pypy/doc/jit.txt
==============================================================================
--- pypy/dist/pypy/doc/jit.txt	(original)
+++ pypy/dist/pypy/doc/jit.txt	Mon Mar 26 16:14:50 2007
@@ -323,10 +323,92 @@
 examples.
 
 
-A (slightly less) tiny interpreter
-==================================
+A slightly less tiny interpreter
+================================
 
-`pypy/jit/tl/tiny2.py`_ XXX
+The interpreter in `pypy/jit/tl/tiny2.py`_ is a reasonably good example
+of the difficulties that we meet when scaling up this approach, and how
+we solve them - or work around them.  For more details, see the comments
+in the source code.  With more work on the JIT generator, we hope to be
+eventually able to remove the need for the workarounds.
+
+Promotion
+---------
+
+The most powerful hint introduced in this example is ``promote=True``.
+It is applied to a value that is usually not a compile-time constant,
+but which we would like to become a compile-time constant "just in
+time".  Its meaning is to instruct the JIT compiler to stop compiling at
+this point, wait until the runtime actually reaches that point, grab the
+value that arrived here at runtime, and go on compiling with the value
+now considered as a compile-time constant.  If the same point is reached
+at runtime several times with several different values, the compiler
+will produce one code path for each, with a switch in the generated
+code.  This is a process that is never "finished": in general, new
+values can always show up later during runtime, causing more code paths
+to be compiled and the switch in the generated code to be extended.
+
+Promotion is the essential new feature introduced in PyPy when compared
+to existing partial evaluation techniques (it was actually first
+introduced in Psyco [JITSPEC]_, which is strictly speaking not a partial
+evaluator).
+
+Another way to understand the effect of promotion is to consider it as a
+complement to the ``concrete=True`` hint.  The latter tells the
+hint-annotator that the value that arrives here is required to be a
+compile-time constant (i.e. green).  In general, this is a very strong
+constraint, because it forces "backwards" a potentially large number of
+values to be green as well - all the values that this one depends on.
+In general, it does not work at all, because the value ultimately
+depends on an operation that cannot be constant-folded at all by the JIT
+compiler, e.g. because it depends on external input or reads from
+non-immutable memory.
+
+The ``promote=True`` hint can take an arbitrary red value and returns it
+as a green variable, so it can be used to bound the set of values that
+need to be forced to green.  A common idiom is to put a
+``concrete=True`` hint at the precise point where a compile-time
+constant would be useful (e.g. on the value on which a complex switch
+dispatches), and then put a few ``promote=True`` hints to copy specific
+values into green variables *before* the ``concrete=True``.
+
+The ``promote=True`` hints should be applied where we expect not too
+many different values to arrive at runtime; here are typical examples:
+
+* Where we expect a small integer, the integer can be promoted if each
+  specialized version can be optimized (e.g. lists of known length can
+  be optimized by the JIT compiler).
+
+* The interpreter-level class of an object can be promoted before an
+  indirect method call, if it is useful for the JIT compiler to look
+  inside the called method.  If the method call is indirect, the JIT
+  compiler merely produces a similar indirect method call in the
+  generated code.  But if the class is a compile-time constant, then it
+  knows which method is called, and compiles its operations (effectively
+  inlining it from the point of the view of the generated code).
+
+* Whole objects can be occasionally promoted, with care.  For example,
+  in an interpreter for a language which has function calls, it might be
+  useful to know exactly which Function object is called (as opposed to
+  just the fact that we call an object of class Function).
+
+Other hints
+-----------
+
+The other hints mentioned in `pypy/jit/tl/tiny2.py`_ are "global merge
+points" and "deepfreeze".  For more information, please refer to the
+explanations there.
+
+We should also mention a technique not used in ``tiny2.py``, which is
+the notion of *virtualizable* objects.  In PyPy, the Python frame
+objects are virtualizable.  Such objects assume that they will be mostly
+read and mutated by the JIT'ed code - this is typical of frame objects
+in most interpreters: they are either not visible at all for the
+interpreted programs, or (as in Python) you have to access them using
+some reflection API.  The ``_virtualizable_`` hint allows the object to
+escape (e.g. in PyPy, the Python frame object is pushed on the
+globally-accessible frame stack) while still remaining efficient to
+access from JIT'ed code.
 
 
 ------------------------------------------------------------------------
@@ -499,6 +581,8 @@
 
 .. _`expanded version of the present document`: discussion/jit-draft.html
 
+---------------
+
 
 .. _VMC: http://codespeak.net/svn/pypy/extradoc/talk/dls2006/pypy-vm-construction.pdf
 .. _`RPython`: coding-guide.html#rpython
@@ -510,5 +594,9 @@
 .. _Psyco: http://psyco.sourceforge.net
 .. _`PyPy Standard Interpreter`: architecture.html#standard-interpreter
 .. _`exception transformer`: translation.html#making-exception-handling-explicit
+.. [JITSPEC] Representation-Based Just-In-Time Specialization and the
+           Psyco Prototype for Python, ACM SIGPLAN PEPM'04, August 24-26, 2004,
+           Verona, Italy.
+           http://psyco.sourceforge.net/psyco-pepm-a.ps.gz
 
 .. include:: _ref.txt

Modified: pypy/dist/pypy/jit/tl/targettiny1.py
==============================================================================
--- pypy/dist/pypy/jit/tl/targettiny1.py	(original)
+++ pypy/dist/pypy/jit/tl/targettiny1.py	Mon Mar 26 16:14:50 2007
@@ -3,6 +3,11 @@
 
 
 def entry_point(args):
+    """Main entry point of the stand-alone executable:
+    takes a list of strings and returns the exit code.
+    """
+    # store args[0] in a place where the JIT log can find it (used by
+    # viewcode.py to know the executable whose symbols it should display)
     highleveljitinfo.sys_executable = args[0]
     if len(args) < 4:
         print "Usage: %s bytecode x y" % (args[0],)
@@ -26,4 +31,8 @@
     oopspec = True
 
 def portal(driver):
+    """Return the 'portal' function, and the hint-annotator policy.
+    The portal is the function that gets patched with a call to the JIT
+    compiler.
+    """
     return tiny1.ll_plus_minus, MyHintAnnotatorPolicy()

Modified: pypy/dist/pypy/jit/tl/targettiny2.py
==============================================================================
--- pypy/dist/pypy/jit/tl/targettiny2.py	(original)
+++ pypy/dist/pypy/jit/tl/targettiny2.py	Mon Mar 26 16:14:50 2007
@@ -3,15 +3,20 @@
 
 
 def entry_point(args):
+    """Main entry point of the stand-alone executable:
+    takes a list of strings and returns the exit code.
+    """
+    # store args[0] in a place where the JIT log can find it (used by
+    # viewcode.py to know the executable whose symbols it should display)
     highleveljitinfo.sys_executable = args[0]
-    if len(args) < 3:
+    if len(args) < 2:
         print "Invalid command line arguments."
         print args[0] + " 'tiny2 program string' arg0 [arg1 [arg2 [...]]]"
         return 1
     bytecode = [s for s in args[1].split(' ') if s != '']
     args = [tiny2.StrBox(arg) for arg in args[2:]]
     res = tiny2.interpret(bytecode, args)
-    print res.as_str()
+    print tiny2.repr(res)
     return 0
 
 def target(driver, args):
@@ -26,7 +31,12 @@
     oopspec = True
 
     def look_inside_graph(self, graph):
+        # temporary workaround
         return getattr(graph, 'func', None) is not tiny2.myint_internal
 
 def portal(driver):
+    """Return the 'portal' function, and the hint-annotator policy.
+    The portal is the function that gets patched with a call to the JIT
+    compiler.
+    """
     return tiny2.interpret, MyHintAnnotatorPolicy()

Modified: pypy/dist/pypy/jit/tl/tiny2.py
==============================================================================
--- pypy/dist/pypy/jit/tl/tiny2.py	(original)
+++ pypy/dist/pypy/jit/tl/tiny2.py	Mon Mar 26 16:14:50 2007
@@ -1,7 +1,45 @@
+"""
+An interpreter for a strange word-based language: the program is a list
+of space-separated words.  Most words push themselves on a stack; some
+words have another action.  The result is the space-separated words
+from the stack.
+
+    Hello World          => 'Hello World'
+    6 7 ADD              => '13'              'ADD' is a special word
+    7 * 5 = 7 5 MUL      => '7 * 5 = 35'      '*' and '=' are not special words
+
+Arithmetic on non-integers gives a 'symbolic' result:
+
+    X 2 MUL              => 'X*2'
+
+Input arguments can be passed on the command-line, and used as #1, #2, etc.:
+
+    #1 1 ADD             => one more than the argument on the command-line,
+                            or if it was not an integer, concatenates '+1'
+
+You can store back into an (existing) argument index with ->#N:
+
+    #1 5 ADD ->#1
+
+Braces { } delimitate a loop.  Don't forget spaces around each one.
+The '}' pops an integer value off the stack and loops if it is not zero:
+
+    { #1 #1 1 SUB ->#1 #1 }    => when called with 5, gives '5 4 3 2 1'
+
+"""
 from pypy.rlib.objectmodel import hint, _is_early_constant
 
+#
+# See pypy/doc/jit.txt for a higher-level overview of the JIT techniques
+# detailed in the following comments.
+#
+
 
 class Box:
+    # Although all words are in theory strings, we use two subclasses
+    # to represent the strings differently from the words known to be integers.
+    # This is an optimization that is essential for the JIT and merely
+    # useful for the basic interpreter.
     pass
 
 class IntBox(Box):
@@ -25,11 +63,17 @@
 def func_sub_int(ix, iy): return ix - iy
 def func_mul_int(ix, iy): return ix * iy
 
-def func_add_str(sx, sy): return sx + ' ' + sy
+def func_add_str(sx, sy): return sx + '+' + sy
 def func_sub_str(sx, sy): return sx + '-' + sy
 def func_mul_str(sx, sy): return sx + '*' + sy
 
 def op2(stack, func_int, func_str):
+    # Operate on the top two stack items.  The promotion hints force the
+    # class of each arguments (IntBox or StrBox) to turn into a compile-time
+    # constant if they weren't already.  The effect we seek is to make the
+    # calls to as_int() direct calls at compile-time, instead of indirect
+    # ones.  The JIT compiler cannot look into indirect calls, but it
+    # can analyze and inline the code in directly-called functions.
     y = stack.pop()
     hint(y.__class__, promote=True)
     x = stack.pop()
@@ -42,9 +86,27 @@
 
 
 def interpret(bytecode, args):
+    """The interpreter's entry point and portal function.
+    """
+    # ------------------------------
+    # First a lot of JIT hints...
+    #
+    # A portal needs a "global merge point" at the beginning, for
+    # technical reasons, if it uses promotion hints:
     hint(None, global_merge_point=True)
+
+    # An important hint: 'bytecode' is a list, which is in theory
+    # mutable.  Let's tell the JIT compiler that it can assume that the
+    # list is entirely frozen, i.e. immutable and only containing immutable
+    # objects.  Otherwise, it cannot do anything - it would have to assume
+    # that the list can unpredictably change at runtime.
     bytecode = hint(bytecode, deepfreeze=True)
-    # ------------------------------
+
+    # Now some strange code that makes a copy of the 'args' list in
+    # a complicated way...  this is a workaround forcing the whole 'args'
+    # list to be virtual.  It is a way to tell the JIT compiler that it
+    # doesn't have to worry about the 'args' list being unpredictably
+    # modified.
     oldargs = args
     argcount = hint(len(oldargs), promote=True)
     args = []
@@ -54,13 +116,21 @@
         args.append(oldargs[n])
         n += 1
     # ------------------------------
+    # the real code starts here
     loops = []
     stack = []
     pos = 0
     while pos < len(bytecode):
+        # It is a good idea to put another 'global merge point' at the
+        # start of each iteration in the interpreter's main loop.  The
+        # JIT compiler keeps a table of all the times it passed through
+        # the global merge point.  It allows it to detect when it can
+        # stop compiling and generate a jump back to some machine code
+        # that was already generated earlier.
         hint(None, global_merge_point=True)
+
         opcode = bytecode[pos]
-        hint(opcode, concrete=True)
+        hint(opcode, concrete=True)    # same as in tiny1.py
         pos += 1
         if   opcode == 'ADD': op2(stack, func_add_int, func_add_str)
         elif opcode == 'SUB': op2(stack, func_sub_int, func_sub_str)
@@ -70,6 +140,8 @@
             stack.append(args[n-1])
         elif opcode.startswith('->#'):
             n = myint(opcode, start=3)
+            if n > len(args):
+                raise IndexError
             args[n-1] = stack.pop()
         elif opcode == '{':
             loops.append(pos)
@@ -78,14 +150,32 @@
                 loops.pop()
             else:
                 pos = loops[-1]
+                # A common problem when interpreting loops or jumps: the 'pos'
+                # above is read out of a list, so the hint-annotator thinks
+                # it must be red (not a compile-time constant).  But the
+                # hint(opcode, concrete=True) in the next iteration of the
+                # loop requires all variables the 'opcode' depends on to be
+                # green, including this 'pos'.  We promote 'pos' to a green
+                # here, as early as possible.  Note that in practice the 'pos'
+                # read out of the 'loops' list will be a compile-time constant
+                # because it was pushed as a compile-time constant by the '{'
+                # case above into 'loops', which is a virtual list, so the
+                # promotion below is just a way to make the colors match.
                 pos = hint(pos, promote=True)
         else:
             stack.append(StrBox(opcode))
-    while len(stack) > 1:
-        op2(stack, func_add_int, func_add_str)
-    return stack.pop()
+    return stack
+
+def repr(stack):
+    # this bit moved out of the portal function because JIT'ing it is not
+    # very useful, and the JIT generator is confused by the 'for' right now...
+    return ' '.join([x.as_str() for x in stack])
 
 
+# ------------------------------
+# Pure workaround code!  It will eventually be unnecessary.
+# For now, myint(s, n) is a JIT-friendly way to spell int(s[n:]).
+# We don't support negative numbers, though.
 def myint_internal(s, start=0):
     if start >= len(s):
         return -1
@@ -98,7 +188,6 @@
         res = res * 10 + n
         start += 1
     return res
-
 def myint(s, start=0):
     if _is_early_constant(s):
         s = hint(s, promote=True)
@@ -111,18 +200,26 @@
         if n < 0:
             raise ValueError
     return n
+# ------------------------------
 
 
 def test_main():
     main = """#1 5 ADD""".split()
     res = interpret(main, [IntBox(20)])
-    assert res.as_int() == 25
+    assert repr(res) == '25'
     res = interpret(main, [StrBox('foo')])
-    assert res.as_str() == 'foo 5'
+    assert repr(res) == 'foo+5'
 
 FACTORIAL = """The factorial of #1 is
                   1 { #1 MUL #1 1 SUB ->#1 #1 }""".split()
 
 def test_factorial():
     res = interpret(FACTORIAL, [IntBox(5)])
-    assert res.as_str() == 'The factorial of 5 is 120'
+    assert repr(res) == 'The factorial of 5 is 120'
+
+FIBONACCI = """Fibonacci numbers:
+                  { #1 #2 #1 #2 ADD ->#2 ->#1 #3 1 SUB ->#3 #3 }""".split()
+
+def test_fibonacci():
+    res = interpret(FIBONACCI, [IntBox(1), IntBox(1), IntBox(10)])
+    assert repr(res) == "Fibonacci numbers: 1 1 2 3 5 8 13 21 34 55"



More information about the Pypy-commit mailing list