PyPy improving generated machine code

Hi, I'm a student at the technical university of Vienna and currently looking for a topic to complete my master thesis. I stumbled over PEP 484 that currently is discussed on the mailing list. It seems to me that this is going to become reality pretty soon. I had the idea that this additional type annotations could beneficial for JIT compilation. Two weeks ago someone already mentioned pretty much the same idea (https://mail.python.org/pipermail/pypy-dev/2015-January/013037.html). In this thread it was mentioned that to improve the compiled code more detailed information (such as an integer stays in range [0-50], ...) would be necessary to remove guards of a trace. I read the document "Tracing the Meta-Level: PyPy's Tracing JIT Compiler" that was published 2009 to understand the basics of how PyPy currently works. I assume that PyPy is still a trace JIT compiler. By using the PEP 484 proposal I think this opens up new possibilities. Using trace compilation as it is done in PyPy or SpiderMonkey makes a lot of sense because most of the time type information is not present prior the first execution. PEP 484 changes the game. After type inference has completed e.g. on a function it should not occur often that a variable's type is unknown. The document "Tracing the Meta-Level" already mentioned that when RPython is provided as input to PyPy it already infers the type. Is that true for not RPython programs as well? I think there are two possibilities to improve the generated machine code for PyPy: * Find a sensible sub set of optimizations that rely on the available type information and further improve the trace compilation * Evaluate other possibilities of inter procedural methods to compile good machine code or completely move to method based jit compilation. I could imagine to evaluate and implement this for my master thesis. What do you think? Would it benefit PyPy? Has anybody else started to implement something similar? Best, Richard

Hi Richard, On 31 January 2015 at 10:51, Richard Plangger <rich@pasra.at> wrote:
By using the PEP 484 proposal I think this opens up new possibilities.
The short answer is - no, it doesn't make sense. User-supplied type annotations wouldn't help at all if they must still be checked, like PEP 484 says. Or, assuming you're fine with obscure crashes when the type annotations are wrong, you would get at most extremely minor speed benefits. There are several reasons for why. One of them is that annotations are at the wrong level (e.g. a PEP 484 "int" corresponds to Python 3's int type, which does not necessarily fits inside one machine word; even worse, an "int" annotation allows arbitrary int subclasses). Another is that a lot more information is needed to produce good code (e.g. "this `f()` called here really means this function there, and will never be monkey-patched" -- same with `len()` or `list()`, btw). The third reason is that some "guards" in PyPy's JIT traces don't really have an obvious corresponding type (e.g. "this dict is so far using keys which don't override `__hash__` so a more efficient implementation was used"). Many guards don't even any correspondence with types at all ("this class attribute was not modified"; "the loop counter did not reach zero so we don't need to release the GIL"; and so on). In summary, as PyPy works right now, it is able to derive far more useful information than can ever be given by PEP 484, and it works automatically. As far as we know, this is true even if we would add other techniques to PyPy, like a fast first-pass method JIT. This should be obvious from the fact that many high-performance JavaScript VMs are method JITs too, and they work very well on source code with no explicit types either. In my opinion, the introductory sentence in that PEP is a lie: "This PEP aims to provide (...) opening up Python code to (...) performance optimizations utilizing type information." This doesn't mean the performance of PyPy is perfectly optimal today. There are certainly things to do and try. One of the major ones (in terms of work involved) would be to add a method-JIT-like approach with a quick-and-dirty initial JIT, able to give not-too-bad performance but without the large warm-up times of our current meta-tracing JIT. More about this or others in a later e-mail, if you're interested. A bientôt, Armin.

On Sat, 2015-01-31 at 15:40 +0100, Armin Rigo wrote:
I might be wrong, by my impression was that it's mainly driven by the desire to have a standardized way to add type hints for the benefit of static analysis, and "performance optimizations" just means stuff Cython could do if the code was explicitly typed.
More about this or others in a later e-mail, if you're interested.
I am!!! -- Sincerely yours, Yury V. Zaytsev

Hi, Even if my idea (PEP 484) does not work out I might still be interested in contributing to PyPy. To decide and get my thesis going I need some more resources I can read (maybe some papers that do it similar what you have in mind) plus some hints to isolate the topic. The method-JIT-like approach sounds interesting. It would be nice if you could provide more detail on the method-JIT-like approach and other things that can be done right now to make PyPy faster. After that I will discuss with my adviser if this is a suitable topic. Best, Richard On 01/31/2015 03:40 PM, Armin Rigo wrote:

On 01/31/2015 03:40 PM, Armin Rigo wrote: ce optimizations utilizing type information."
Hi, Sorry to bother again. I did not get any response yet. The problem is that I need a better picture about a topic I could work on for my thesis and I really would like to contribute to pypy. In this week I would like to decide what I'm aiming for (otherwise things might get shifted). It would be nice to have the information you mentioned earlier in your email about the method-JIT-like approach and others! Best, Richard

On 02/02/15 18:56, Richard Plangger wrote:
Just to throw my uneducated opinions into the ring. It would be nice to have someone study autovectorization and hardware acceleration in a JIT. There are many possible directions: identifying vectorizable actions via traces or user-supplied hints, resuse of llvm or gcc's strategies, creating the proper guards, somehow modelling in costs of memory caching into the tradeoff of what to parallelize, ... Matti

Hi Richard, On 2 February 2015 at 17:56, Richard Plangger <rich@pasra.at> wrote:
Our general topic list is here: http://pypy.readthedocs.org/en/latest/project-ideas.html Others than me should chime in too, because I'm not the best person to recommend what should be worked on in general. Myself, I am particularly interested in the STM work, where there is certainly something to do, although it's a bit hard to plan in advance: it's mostly of the kind: run stuff with pypy-stm, find conflicts, figure out which ones are not inherent to the application but are simply limitations of the interpreter, and then think about ways to avoid them. About the method JIT, we have only vague ideas. I can't give any estimate of how much work it will be, as it depends a lot on details of the approach, like how much of the existing infrastructure can be reused and how much must be redone from scratch. To be clear, we don't even have any clear indication that it would be a good idea. It seems to require more interpreter-specific hints (say for PyPy's Python interpreter) to drive the process, in order to control where it should stop inlining and start emitting calls to the existing functions. The prior work gives mixed results: if you consider for example Jython running on a method-JIT-based Java, it would be similar (minus possible hints we can add), but Jython is not faster than CPython. On the other hand, untyped Cython is usually faster than CPython, but it benefits from gcc's slow optimizations. I would classify it as very much a research project. A bientôt, Armin.

Hi One topic that captured my interest is a light-weight type-specilizing JIT that does not emit any assembler, but instead rewrites the bytecode into a more type-specific form. This has been done before in luajit and gives good results there. I wonder if the same can be applied in PyPy PS. Feel free to pop in to IRC #pypy on freenode, I can explain in more detail what I mean Cheers, fijal On Tue, Feb 3, 2015 at 1:54 PM, Armin Rigo <arigo@tunes.org> wrote:

Hi Richard, On 31 January 2015 at 10:51, Richard Plangger <rich@pasra.at> wrote:
By using the PEP 484 proposal I think this opens up new possibilities.
The short answer is - no, it doesn't make sense. User-supplied type annotations wouldn't help at all if they must still be checked, like PEP 484 says. Or, assuming you're fine with obscure crashes when the type annotations are wrong, you would get at most extremely minor speed benefits. There are several reasons for why. One of them is that annotations are at the wrong level (e.g. a PEP 484 "int" corresponds to Python 3's int type, which does not necessarily fits inside one machine word; even worse, an "int" annotation allows arbitrary int subclasses). Another is that a lot more information is needed to produce good code (e.g. "this `f()` called here really means this function there, and will never be monkey-patched" -- same with `len()` or `list()`, btw). The third reason is that some "guards" in PyPy's JIT traces don't really have an obvious corresponding type (e.g. "this dict is so far using keys which don't override `__hash__` so a more efficient implementation was used"). Many guards don't even any correspondence with types at all ("this class attribute was not modified"; "the loop counter did not reach zero so we don't need to release the GIL"; and so on). In summary, as PyPy works right now, it is able to derive far more useful information than can ever be given by PEP 484, and it works automatically. As far as we know, this is true even if we would add other techniques to PyPy, like a fast first-pass method JIT. This should be obvious from the fact that many high-performance JavaScript VMs are method JITs too, and they work very well on source code with no explicit types either. In my opinion, the introductory sentence in that PEP is a lie: "This PEP aims to provide (...) opening up Python code to (...) performance optimizations utilizing type information." This doesn't mean the performance of PyPy is perfectly optimal today. There are certainly things to do and try. One of the major ones (in terms of work involved) would be to add a method-JIT-like approach with a quick-and-dirty initial JIT, able to give not-too-bad performance but without the large warm-up times of our current meta-tracing JIT. More about this or others in a later e-mail, if you're interested. A bientôt, Armin.

On Sat, 2015-01-31 at 15:40 +0100, Armin Rigo wrote:
I might be wrong, by my impression was that it's mainly driven by the desire to have a standardized way to add type hints for the benefit of static analysis, and "performance optimizations" just means stuff Cython could do if the code was explicitly typed.
More about this or others in a later e-mail, if you're interested.
I am!!! -- Sincerely yours, Yury V. Zaytsev

Hi, Even if my idea (PEP 484) does not work out I might still be interested in contributing to PyPy. To decide and get my thesis going I need some more resources I can read (maybe some papers that do it similar what you have in mind) plus some hints to isolate the topic. The method-JIT-like approach sounds interesting. It would be nice if you could provide more detail on the method-JIT-like approach and other things that can be done right now to make PyPy faster. After that I will discuss with my adviser if this is a suitable topic. Best, Richard On 01/31/2015 03:40 PM, Armin Rigo wrote:

On 01/31/2015 03:40 PM, Armin Rigo wrote: ce optimizations utilizing type information."
Hi, Sorry to bother again. I did not get any response yet. The problem is that I need a better picture about a topic I could work on for my thesis and I really would like to contribute to pypy. In this week I would like to decide what I'm aiming for (otherwise things might get shifted). It would be nice to have the information you mentioned earlier in your email about the method-JIT-like approach and others! Best, Richard

On 02/02/15 18:56, Richard Plangger wrote:
Just to throw my uneducated opinions into the ring. It would be nice to have someone study autovectorization and hardware acceleration in a JIT. There are many possible directions: identifying vectorizable actions via traces or user-supplied hints, resuse of llvm or gcc's strategies, creating the proper guards, somehow modelling in costs of memory caching into the tradeoff of what to parallelize, ... Matti

Hi Richard, On 2 February 2015 at 17:56, Richard Plangger <rich@pasra.at> wrote:
Our general topic list is here: http://pypy.readthedocs.org/en/latest/project-ideas.html Others than me should chime in too, because I'm not the best person to recommend what should be worked on in general. Myself, I am particularly interested in the STM work, where there is certainly something to do, although it's a bit hard to plan in advance: it's mostly of the kind: run stuff with pypy-stm, find conflicts, figure out which ones are not inherent to the application but are simply limitations of the interpreter, and then think about ways to avoid them. About the method JIT, we have only vague ideas. I can't give any estimate of how much work it will be, as it depends a lot on details of the approach, like how much of the existing infrastructure can be reused and how much must be redone from scratch. To be clear, we don't even have any clear indication that it would be a good idea. It seems to require more interpreter-specific hints (say for PyPy's Python interpreter) to drive the process, in order to control where it should stop inlining and start emitting calls to the existing functions. The prior work gives mixed results: if you consider for example Jython running on a method-JIT-based Java, it would be similar (minus possible hints we can add), but Jython is not faster than CPython. On the other hand, untyped Cython is usually faster than CPython, but it benefits from gcc's slow optimizations. I would classify it as very much a research project. A bientôt, Armin.

Hi One topic that captured my interest is a light-weight type-specilizing JIT that does not emit any assembler, but instead rewrites the bytecode into a more type-specific form. This has been done before in luajit and gives good results there. I wonder if the same can be applied in PyPy PS. Feel free to pop in to IRC #pypy on freenode, I can explain in more detail what I mean Cheers, fijal On Tue, Feb 3, 2015 at 1:54 PM, Armin Rigo <arigo@tunes.org> wrote:
participants (5)
-
Armin Rigo
-
Maciej Fijalkowski
-
Matti Picus
-
Richard Plangger
-
Yury V. Zaytsev