Contributing Polyhedral Optimisations in PyPy
I'm doing a computer science masters and am looking for an appropriate project to take on for a dissertation related to Polyhedral optimisations. Talking to my professor, we both think trying to implement the model and it's loop transformations in PyPy's JIT optimiser could be a good project to pursue, but before committing to anything I wanted to run this idea by the devs here who might be able to point out any hurdles I'd be likely to quickly come across that could prove difficult to solve at just a masters level, or whether or not these optimisations are actually already implemented in the first place (I have tried to google if this is the case and hadn't found anything, but can't be sure). I think this could have some good real world impact too as a lot of scientific code is written in Python and run on PyPy, and the Polyhedral model can offer substantial performance improvements in the form of auto-parallelization for these types of codes, which is why I'm interested in working on this for PyPy rather than CPython, although if anyone has good reason that I might want to look at CPython for this over PyPy please let me know. Appreciate any and all advice, thanks.
On Fri, 18 Dec 2020 at 05:14, muke101 via pypy-dev <pypy-dev@python.org> wrote:
I'm doing a computer science masters and am looking for an appropriate project to take on for a dissertation related to Polyhedral optimisations. Talking to my professor, we both think trying to implement the model and it's loop transformations in PyPy's JIT optimiser could be a good project to pursue, but before committing to anything I wanted to run this idea by the devs here who might be able to point out any hurdles I'd be likely to quickly come across that could prove difficult to solve at just a masters level, or whether or not these optimisations are actually already implemented in the first place (I have tried to google if this is the case and hadn't found anything, but can't be sure). I think this could have some good real world impact too as a lot of scientific code is written in Python and run on PyPy, and the Polyhedral model can offer substantial performance improvements in the form of auto-parallelization for these types of codes, which is why I'm interested in working on this for PyPy rather than CPython, although if anyone has good reason that I might want to look at CPython for this over PyPy please let me know.
Appreciate any and all advice, thanks.
Hi! That's a great topic. The challenge with implementing this in the pypy JIT at this point is that the JIT only sees one control flow path. That is, one loop, and the branches taken within that loop. It does not find out about the outer loop usually until later, and may not ever find out about the content of other control flow paths if they aren't taken. This narrows the amount of information available about effects and possible aliases quite a bit, making semantic-preserving cross-loop transformations difficult in many cases. On the other hand, since you can deal with precise types in the JIT, it's possible to narrow down the domain of discourse, which might make it possible to rule out problematic side-effects. Nevertheless, please dig and experiment. You might find that a combination of custom annotations and JIT work get you what you need. -- William Leslie Q: What is your boss's password? A: "Authentication", clearly Notice: Likely much of this email is, by the nature of copyright, covered under copyright law. You absolutely MAY reproduce any part of it in accordance with the copyright law of the nation you are reading this in. Any attempt to DENY YOU THOSE RIGHTS would be illegal without prior contractual agreement.
Hi, On Thu, 17 Dec 2020 at 23:48, William ML Leslie <william.leslie.ttg@gmail.com> wrote:
The challenge with implementing this in the pypy JIT at this point is that the JIT only sees one control flow path. That is, one loop, and the branches taken within that loop. It does not find out about the outer loop usually until later, and may not ever find out about the content of other control flow paths if they aren't taken.
Note that strictly speaking, the problem is not that you haven't seen yet other code paths. It's Python, so you never know what may happen in the future---maybe another code path will be taken, or maybe someone will do crazy things with `sys._getframe()` or with the debugger `pdb`. So merely seeing all paths in a function doesn't really buy you a lot. No, the problem is that emitting machine code is incremental at the granularity of code paths. At the point where we see a new code path, all previously-seen code paths have already been completely optimized and turned into machine code, and we don't keep much information about them. To go beyond this simple model, what we have so far is that we can "invalidate" previous code paths at any point, when we figure out that they were compiled using assumptions that no longer hold. So using it, it would be possible in theory to do any amount of global optimizations: save enough additional information as you see each code path; use it later in the optimization of additional code paths; invalidate some of the old code paths if you figure out that its optimizations are no longer valid (but invalidate only, not write a new version yet); and when you later see the old code path being generated again, optimize it differently. It's all doable, but theoretical so far: I don't know of any JIT compiler that seriously does things like that. It's certainly worth a research paper IMHO. It also looks like quite some work. It's certainly not just "take some ideas from [ahead-of-time or full-method] compiler X and apply them to PyPy". A bientôt, Armin.
Thanks both of you for getting back to me, these definitely seem like problems worth thinking about first. Looking into it, there has actually been some research already on implementing Polyhedral optimisations in a JIT optimiser, specifically in JavaScript. It's paper (http://impact.gforge.inria.fr/impact2018/papers/polyhedral-javascript.pdf) seems to point out the same problems you both bring up, like SCoP detection and aliasing, and how it worked around them. For now then I'll try and consider how ambitious replicating these solutions would be and if they would map into PyPy from JS cleanly - please let me know if any other hurdles come to mind in the meantime though. Thanks again for the advise. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, 18 December 2020 18:03, Armin Rigo <armin.rigo@gmail.com> wrote:
Hi,
On Thu, 17 Dec 2020 at 23:48, William ML Leslie william.leslie.ttg@gmail.com wrote:
The challenge with implementing this in the pypy JIT at this point is that the JIT only sees one control flow path. That is, one loop, and the branches taken within that loop. It does not find out about the outer loop usually until later, and may not ever find out about the content of other control flow paths if they aren't taken.
Note that strictly speaking, the problem is not that you haven't seen yet other code paths. It's Python, so you never know what may happen in the future---maybe another code path will be taken, or maybe someone will do crazy things with `sys._getframe()` or with the debugger `pdb`. So merely seeing all paths in a function doesn't really buy you a lot. No, the problem is that emitting machine code is incremental at the granularity of code paths. At the point where we see a new code path, all previously-seen code paths have already been completely optimized and turned into machine code, and we don't keep much information about them.
To go beyond this simple model, what we have so far is that we can "invalidate" previous code paths at any point, when we figure out that they were compiled using assumptions that no longer hold. So using it, it would be possible in theory to do any amount of global optimizations: save enough additional information as you see each code path; use it later in the optimization of additional code paths; invalidate some of the old code paths if you figure out that its optimizations are no longer valid (but invalidate only, not write a new version yet); and when you later see the old code path being generated again, optimize it differently. It's all doable, but theoretical so far: I don't know of any JIT compiler that seriously does things like that. It's certainly worth a research paper IMHO. It also looks like quite some work. It's certainly not just "take some ideas from [ahead-of-time or full-method] compiler X and apply them to PyPy".
A bientôt,
Armin.
Hi, On Fri, 18 Dec 2020 at 19:15, muke101 <muke101@protonmail.com> wrote:
Thanks both of you for getting back to me, these definitely seem like problems worth thinking about first. Looking into it, there has actually been some research already on implementing Polyhedral optimisations in a JIT optimiser, specifically in JavaScript. It's paper (http://impact.gforge.inria.fr/impact2018/papers/polyhedral-javascript.pdf) seems to point out the same problems you both bring up, like SCoP detection and aliasing, and how it worked around them.
For now then I'll try and consider how ambitious replicating these solutions would be and if they would map into PyPy from JS cleanly - please let me know if any other hurdles come to mind in the meantime though.
I assume that by "JavaScript" you mean JavaScript with a method-based JIT compiler. At this level, that's the main difference with PyPy, which contains RPython's tracing JIT compiler instead. The fact that they are about the JavaScript or Python language is not that important. Here's another idea about how to do more advanced optimizations in a tracing JIT a la PyPy. The idea would be to keep enough metadata for the various pieces of machine code that the current backend produces, and add logic to detect when this machine code runs for long enough. At that point, we would involve a (completely new) second level backend, which would consolidate the pieces into a better-optimized whole. This is an idea that exists in method JITs but that should also work in tracing JITs: the second level backend can see all the common paths at once, instead of one after the other. The second level can be slower (within reason), and it can even know how common each path is, which might give it an edge over ahead-of-time compilers. A bientôt, Armin.
Hi, so to update you both I have decided to pursue this project after all, I'm very excited to work on PyPy. To reiterate my objective, I'll be trying to formulate a way to augment the JIT optimiser to expose enough information such that more advanced optimisations can be implemented, with Polyhedral compatibility in mind. Of the ideas suggested, I'm currently leaning towards trying to create a second optimisation layer for sufficiently hot code, which can take into account more of the program at once, as this seems similar to what other JIT compilers already employ. This is open to the problem of assumptions being invalidated that Armin bought up (if I understood correctly), but similar implementations like in the paper I referred to below have formulated methods to accommodate for this. I think the key for PyPy would be figuring out how to track the correct metadata from previously seen traces such that the bytecode from relevant control paths can be bought together to work on, and mainly reconstructing entire loops once individual sufficiently hot traces are found. Once this is done then actually any number of optimisations could be preformed. The JIT compiler in the JavaScriptCore engine compiles the hottest bytecode down to LLVM-IR and sends it through the LLVM back end. I had looked into similar possibilities for Python, and it seems only a subset of the language can be compiled to LLVM-IR through Numa though, which is a shame. If focusing on just Polyhedral optimisations though a possibility could be to write a SCoP detector for Python bytecode specifically, raise it to Polyhedral representation and then import it into LLVM's Polly tool, but this is getting ahead a bit. I'll be getting to grips with PyPy's codebase soon, after I'm comfortable with the fundamentals of tracing JIT's. Do you have any suggestions on where to begin specifically for what I'm looking to do? I imagine generally all this will be within the JIT optimiser, but if there's anything specific you can think of please let me know. Thanks. Sent with ProtonMail Secure Email. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, 18 December 2020 18:15, muke101 <muke101@protonmail.com> wrote:
Thanks both of you for getting back to me, these definitely seem like problems worth thinking about first. Looking into it, there has actually been some research already on implementing Polyhedral optimisations in a JIT optimiser, specifically in JavaScript. It's paper (http://impact.gforge.inria.fr/impact2018/papers/polyhedral-javascript.pdf) seems to point out the same problems you both bring up, like SCoP detection and aliasing, and how it worked around them.
For now then I'll try and consider how ambitious replicating these solutions would be and if they would map into PyPy from JS cleanly - please let me know if any other hurdles come to mind in the meantime though.
Thanks again for the advise.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, 18 December 2020 18:03, Armin Rigo armin.rigo@gmail.com wrote:
Hi, On Thu, 17 Dec 2020 at 23:48, William ML Leslie william.leslie.ttg@gmail.com wrote:
The challenge with implementing this in the pypy JIT at this point is that the JIT only sees one control flow path. That is, one loop, and the branches taken within that loop. It does not find out about the outer loop usually until later, and may not ever find out about the content of other control flow paths if they aren't taken.
Note that strictly speaking, the problem is not that you haven't seen yet other code paths. It's Python, so you never know what may happen in the future---maybe another code path will be taken, or maybe someone will do crazy things with `sys._getframe()` or with the debugger `pdb`. So merely seeing all paths in a function doesn't really buy you a lot. No, the problem is that emitting machine code is incremental at the granularity of code paths. At the point where we see a new code path, all previously-seen code paths have already been completely optimized and turned into machine code, and we don't keep much information about them. To go beyond this simple model, what we have so far is that we can "invalidate" previous code paths at any point, when we figure out that they were compiled using assumptions that no longer hold. So using it, it would be possible in theory to do any amount of global optimizations: save enough additional information as you see each code path; use it later in the optimization of additional code paths; invalidate some of the old code paths if you figure out that its optimizations are no longer valid (but invalidate only, not write a new version yet); and when you later see the old code path being generated again, optimize it differently. It's all doable, but theoretical so far: I don't know of any JIT compiler that seriously does things like that. It's certainly worth a research paper IMHO. It also looks like quite some work. It's certainly not just "take some ideas from [ahead-of-time or full-method] compiler X and apply them to PyPy". A bientôt, Armin.
Hi! it's a bit hard to know what what to suggest to start with. Would you be interested in setting up a Zoom call (eg next week, some evening CET) to discuss a bit your concrete plans and timeline? (I for one would be somewhat worried whether all that you are describing is doable in the timing context of a master thesis project, but it may be best to really discuss it person). Cheers, Carl Friedrich On 19.01.21 01:45, muke101 via pypy-dev wrote:
Hi, so to update you both I have decided to pursue this project after all, I'm very excited to work on PyPy.
To reiterate my objective, I'll be trying to formulate a way to augment the JIT optimiser to expose enough information such that more advanced optimisations can be implemented, with Polyhedral compatibility in mind. Of the ideas suggested, I'm currently leaning towards trying to create a second optimisation layer for sufficiently hot code, which can take into account more of the program at once, as this seems similar to what other JIT compilers already employ. This is open to the problem of assumptions being invalidated that Armin bought up (if I understood correctly), but similar implementations like in the paper I referred to below have formulated methods to accommodate for this. I think the key for PyPy would be figuring out how to track the correct metadata from previously seen traces such that the bytecode from relevant control paths can be bought together to work on, and mainly reconstructing entire loops once individual sufficiently hot traces are found. Once this is done then actually any number of optimisations could be preformed. The JIT compiler in the JavaScriptCore engine compiles the hottest bytecode down to LLVM-IR and sends it through the LLVM back end. I had looked into similar possibilities for Python, and it seems only a subset of the language can be compiled to LLVM-IR through Numa though, which is a shame. If focusing on just Polyhedral optimisations though a possibility could be to write a SCoP detector for Python bytecode specifically, raise it to Polyhedral representation and then import it into LLVM's Polly tool, but this is getting ahead a bit.
I'll be getting to grips with PyPy's codebase soon, after I'm comfortable with the fundamentals of tracing JIT's. Do you have any suggestions on where to begin specifically for what I'm looking to do? I imagine generally all this will be within the JIT optimiser, but if there's anything specific you can think of please let me know.
Thanks.
Sent with ProtonMail Secure Email.
participants (4)
-
Armin Rigo
-
Carl Friedrich Bolz-Tereick
-
muke101
-
William ML Leslie