[pypy-dev] Static bytecode instruction optimization and pypy (JIT?) optimization

Tue Jul 11 02:51:34 EDT 2017

On 10 July 2017 at 11:10, Rocky Bernstein <rb at dustyfeet.com> wrote:
> In looking at Python bytecode over the years, I've noticed that it does very
> little to no traditional static-compiler optimization. Specifically:
>
>
> Yes, over the years more compiler optimization has been done but it's still
> pretty sparse.
>
> The little that I've looked at pypy, it is pretty much the same thing at the
> Python bytecode level. That's why a python decompiler for say 2.7 will work
> largely unmodified for he corresponding pypy 2.7 version. Same is true 3.2
> versus pypy 3.2.
>
> I understand though that pypy can do and probably does some of the
> optimization when it JITs. But my question is: if traditional compiler
> optimization were done would this hinder, be neutral, or help pypy's
> optimization.
>

It'd likely be neutral with respect to the JIT optimisation, which
operates on traces (what actually happens to the rpython objects)
rather than bytecode.  Within a trace, which is a fairly closed
universe, all of those optimisations are very easy to make.

As Ryan has mentioned, the bigger concern is going to be around the
expectations that people have on how python code will behave, and a
secondary concern is going to be people relying on optimisations that
only work on pypy.  It's one thing for people to tune their programs
for pypy, and another for people to rely on certain optimisations
being performed for correct execution.

If you attempt this, here's a few things to keep in mind:

> * Dead code elmination (most of the time)

Mostly a code size optimisation and not so key for performance, but it
is very strange that python generates different code for `return` at
the end of a function and `else: return` also at the end of a
function.  On the other hand, `if 0:` statements are removed, which is
handy.

> * Jumps to Jumps or Jumps to the next instruction

SGTM

> * Constant propagation (most of the time)

This is an interesting one.  Consider beta reducing xs in this code
(assume xs is otherwise unused):

\[
  xs = [1, 7, 2]

  if x in xs:
      ....
\]

Doing so would allow the list to be converted to a tuple and stored as
a constant against the code object (mostly because the value is never
used in an identity-dependent way).  However, people can use the
debugger to set a breakpoint on the assignment to xs, which they can't
do if no evaluation happens on this line.  They can even set xs to be
a different value in the debugger before continuing.

I'm a big fan of prebuilding objects at compile time, it's something
that the pypy team wished the JVM could do back when the Java backend
was written because loading time was awful.  Even now the JVM is
making changes to the constant pool layout that improve that
possibility.

> * Common subexpression elimination

This is not widely applicable in python, in fact the only
subexpressions that can be eliminated are a small group of expressions
that apply to objects constructed from literals.  A notable example is
that you can't remove a duplicated `x = 1 + y` because the object that
`y` refers to may have overridden __radd__ and that method may even
have side-effects.  There can be some success when either
whole-program optimisation, encoding preconditions into optimistically
compiled module code, or intraprocedural effect analysis is performed,
but there isn't much precedent for the first two in pypy and the third
is the subject of ongoing research.

> * Tail recursion

For functions that the compiler has shown are total and have a
reasonable and static bound on memory allocation and time?  Any
situation where asyncs need to be suspended, lest anyone hit C-c and
generate a traceback / kill the operation because it is taking too
long, need to be very tightly time bounded.

> * Code hoisting/sinking

Again, without a really solid type or effect system it's really hard
to know you're not changing the behaviour of the program when
performing such code movement, because almost all operations can
delegate to user-defined methods.

> Of course, if there were static compiler optimization of the type described
> above, that might be a win when JITing doesn't kick in. (And perhaps then
> who cares)
>
> But I'm interested in the other situation where both are in play.
>

A direction that might be interesting is seeing how far you can get by
making reasonable assumptions (module bindings for certain functions
have their original values, this value has this exact code object as
its special method `__radd__` or method `append`, this module `foo`
imported has this exact hash), checking them and branching to
optimised code, which may or may not be pypy bytecode.  The
preconditions themselves can't be written in pypy bytecode - you don't
want to be executing side effects when pre-emptively *looking up*
methods, which can happen due to descriptors, so you'd need to go
behind the scenes.  Nevertheless, they can be platform-independent and
could live in the .pyc for a first cut.

At the moment, it's necessary to assume *nothing* on return from *any*
user code, which can violate assumptions in remarkable ways, including
"what if the cmp or key functions change the list being sorted" or
"what if an __eq__ method removes the key being examined from the dict
or re-assigns it with a different hash", two cases that the pypy and
python runtimes handle without crashing.  Another good source of
problems is that python's concurrent memory model is far stricter than
Java's, so an optimisation that might be fine in a single threaded
situation may break sequential consistency.

I had been thinking that this sort of thing is a huge engineering
effort, but if you're just generating optimised bytecode and jumping
to that with some custom precondition-checking operation rather than
generating machine code, it might be achieveable. Generating machine
code is difficult because neither of rpython's compilers are well
suited to the task of generating general-purpose position-independent
machine code at app level - the JIT generates straight-line code only
and this assumption is made throughout the codebase, it also assumes
constants are in-memory during compilation and doesn't expect them to
be reloaded at a different location, wheras the static rpython
compiler expects to work on a whole program.

-- 
William Leslie

Notice:
Likely much of this email is, by the nature of copyright, covered
under copyright law.  You absolutely MAY reproduce any part of it in
accordance with the copyright law of the nation you are reading this
in.  Any attempt to DENY YOU THOSE RIGHTS would be illegal without
prior contractual agreement.