Hi, some time ago, there were some discussion about loop invaraints, but no conclusion. What do you think about the following approach: - Let optimize_loop mark the arguments in loop.inputargs as invariant if they appear at the same position in the jump instruction at the end before calling propagate_formward - Let the optimize_... methods emit operations that only uses invariant arguments to some preamble instead of emitting them to self.newoperations whenever that is safe. Also, the result of these operations should probably be marked as invariant. - Insert the created preamble at every point where the loop is called, right before the jump. - When compiling a bridge from a failing guard, run the the preamble through propagate_formward and discard the emitted operations, to inherit that part of the state of Optimizer. This should place the invariant instructions at the end of the entry bridge, which is a suitable place, right? At the end of a bridge from a failing guard that maintains the invariants the optimizer should remove the inserted preamble again, right? And at the end of a bridge that invalidates them, enough of the preamble will be kept to maintain correct behavior, right? -- Håkan Ardö
The other part of the work is the algorithm that finds loop variants. It is similar to the algorithm for variable colour inference, so you do have a starting point. On 28/08/2010 11:12 PM, "Hakan Ardo" <hakan@debian.org> wrote: Hi, some time ago, there were some discussion about loop invaraints, but no conclusion. What do you think about the following approach: - Let optimize_loop mark the arguments in loop.inputargs as invariant if they appear at the same position in the jump instruction at the end before calling propagate_formward - Let the optimize_... methods emit operations that only uses invariant arguments to some preamble instead of emitting them to self.newoperations whenever that is safe. Also, the result of these operations should probably be marked as invariant. - Insert the created preamble at every point where the loop is called, right before the jump. - When compiling a bridge from a failing guard, run the the preamble through propagate_formward and discard the emitted operations, to inherit that part of the state of Optimizer. This should place the invariant instructions at the end of the entry bridge, which is a suitable place, right? At the end of a bridge from a failing guard that maintains the invariants the optimizer should remove the inserted preamble again, right? And at the end of a bridge that invalidates them, enough of the preamble will be kept to maintain correct behavior, right? -- Håkan Ardö _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Hi Håkan, thanks for taking up the topic. On 08/28/2010 03:05 PM, Hakan Ardo wrote:
- Let optimize_loop mark the arguments in loop.inputargs as invariant if they appear at the same position in the jump instruction at the end before calling propagate_formward
sounds good.
- Let the optimize_... methods emit operations that only uses invariant arguments to some preamble instead of emitting them to self.newoperations whenever that is safe. Also, the result of these operations should probably be marked as invariant.
Need to be a bit careful about operations with side-effects, but basically yes.
- Insert the created preamble at every point where the loop is called, right before the jump.
This part makes sense to me. The code would have to be careful to match the variables in the trace and in the preamble.
- When compiling a bridge from a failing guard, run the the preamble through propagate_formward and discard the emitted operations, to inherit that part of the state of Optimizer.
... but I don't see why this is needed. Wouldn't you rather need the whole trace of the loop including the preamble up to the failing guard? This would be bad, because you need to store the full trace then.
This should place the invariant instructions at the end of the entry bridge, which is a suitable place, right? At the end of a bridge from a failing guard that maintains the invariants the optimizer should remove the inserted preamble again, right? And at the end of a bridge that invalidates them, enough of the preamble will be kept to maintain correct behavior, right?
Yes to all the questions, at least as fas as I can see. I guess in practice there might be complications. Cheers, Carl Friedrich P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think it would be good if the bounds-related optimizations could move out of optimizeopt.py to their own file, because otherwise optimizeopt.py is getting really unwieldy. Does that make sense?
On Sun, Aug 29, 2010 at 12:32 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:
... but I don't see why this is needed. Wouldn't you rather need the
My thinking was that for the preamble to be removed from the end of a bridge maintaining the invariant this would be needed? But I might be mistaking?
whole trace of the loop including the preamble up to the failing guard? This would be bad, because you need to store the full trace then.
OK, so that might be a problem. Maybe it would be possible to extract what part of the state it would be safe to inherit even if only the preamble has been processed, i.e. self.pure_operations might be ok?
P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think it would be good if the bounds-related optimizations could move out of optimizeopt.py to their own file, because otherwise optimizeopt.py is getting really unwieldy. Does that make sense?
Well, class IntBound and the propagate_bounds_ methods could probably be moved elsewhere, but a lot of the work is done in optimize_... methods, which I'm not so sure it would make sens to split up. -- Håkan Ardö
On 08/29/2010 01:49 PM, Hakan Ardo wrote:
P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think it would be good if the bounds-related optimizations could move out of optimizeopt.py to their own file, because otherwise optimizeopt.py is getting really unwieldy. Does that make sense?
Well, class IntBound and the propagate_bounds_ methods could probably be moved elsewhere, but a lot of the work is done in optimize_... methods, which I'm not so sure it would make sens to split up.
I guess then the things that can be sanely moved should move. The file is nearly 2000 lines, which is way too big. I guess also the heap optimizations could go to their own file. Carl Friedrich
On Sun, Aug 29, 2010 at 2:03 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:
On 08/29/2010 01:49 PM, Hakan Ardo wrote:
P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think it would be good if the bounds-related optimizations could move out of optimizeopt.py to their own file, because otherwise optimizeopt.py is getting really unwieldy. Does that make sense?
Well, class IntBound and the propagate_bounds_ methods could probably be moved elsewhere, but a lot of the work is done in optimize_... methods, which I'm not so sure it would make sens to split up.
I guess then the things that can be sanely moved should move. The file is nearly 2000 lines, which is way too big. I guess also the heap optimizations could go to their own file.
Carl Friedrich
How about a couple of files (preferably small) each containing a contained optimization if possible? (maybe a package?)
Ok, so we split it up into a set of Optimization classes in separate files. Each containing a subset of the optimize_... methods. Then we have the propagate_forward method iterate over the instructions passing them to one Optimization after the other? That way we keep the single iteration over the instructions. Would it be preferable to separate them even more and have each Optimization contain it's own loop over the instructions? On Sun, Aug 29, 2010 at 10:05 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Sun, Aug 29, 2010 at 2:03 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:
On 08/29/2010 01:49 PM, Hakan Ardo wrote:
P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think it would be good if the bounds-related optimizations could move out of optimizeopt.py to their own file, because otherwise optimizeopt.py is getting really unwieldy. Does that make sense?
Well, class IntBound and the propagate_bounds_ methods could probably be moved elsewhere, but a lot of the work is done in optimize_... methods, which I'm not so sure it would make sens to split up.
I guess then the things that can be sanely moved should move. The file is nearly 2000 lines, which is way too big. I guess also the heap optimizations could go to their own file.
Carl Friedrich
How about a couple of files (preferably small) each containing a contained optimization if possible? (maybe a package?)
-- Håkan Ardö
Hi, I've checked in a version of optimizeopt that is a package with support for building a chain of optimizations and passing instructions down this chain. Does this design make sens? If so I'll start moving the different optimization to the different files. It will require some refactoring, but not too much I hope... On Tue, Aug 31, 2010 at 9:25 AM, Hakan Ardo <hakan@debian.org> wrote:
Ok, so we split it up into a set of Optimization classes in separate files. Each containing a subset of the optimize_... methods. Then we have the propagate_forward method iterate over the instructions passing them to one Optimization after the other? That way we keep the single iteration over the instructions. Would it be preferable to separate them even more and have each Optimization contain it's own loop over the instructions?
On Sun, Aug 29, 2010 at 10:05 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Sun, Aug 29, 2010 at 2:03 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:
On 08/29/2010 01:49 PM, Hakan Ardo wrote:
P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think it would be good if the bounds-related optimizations could move out of optimizeopt.py to their own file, because otherwise optimizeopt.py is getting really unwieldy. Does that make sense?
Well, class IntBound and the propagate_bounds_ methods could probably be moved elsewhere, but a lot of the work is done in optimize_... methods, which I'm not so sure it would make sens to split up.
I guess then the things that can be sanely moved should move. The file is nearly 2000 lines, which is way too big. I guess also the heap optimizations could go to their own file.
Carl Friedrich
How about a couple of files (preferably small) each containing a contained optimization if possible? (maybe a package?)
-- Håkan Ardö
-- Håkan Ardö
On Tue, Aug 31, 2010 at 09:25, Hakan Ardo <hakan@debian.org> wrote:
Ok, so we split it up into a set of Optimization classes in separate files. Each containing a subset of the optimize_... methods. Then we have the propagate_forward method iterate over the instructions passing them to one Optimization after the other? That way we keep the single iteration over the instructions. Would it be preferable to separate them even more and have each Optimization contain it's own loop over the instructions?
But won't this affect performance? Which is very important in a JIT compiler. When compiling traces bigger than a cacheline, it might even affect locality, i.e. be an important performance problem. Unless your RPython compiler can join the loops. If they are just loops, it could. If they are tree visits, it likely can't; it's done by the Haskell compiler (google Haskell, stream fusion, shortcut deforestation, I guess), but the techniques are unlikely to generalize to languages with side effects; it's also done/doable in some Domain-Specific Languages for tree visitors.
On Sun, Aug 29, 2010 at 10:05 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Sun, Aug 29, 2010 at 2:03 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:
On 08/29/2010 01:49 PM, Hakan Ardo wrote:
P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think it would be good if the bounds-related optimizations could move out of optimizeopt.py to their own file, because otherwise optimizeopt.py is getting really unwieldy. Does that make sense?
Well, class IntBound and the propagate_bounds_ methods could probably be moved elsewhere, but a lot of the work is done in optimize_... methods, which I'm not so sure it would make sens to split up.
I guess then the things that can be sanely moved should move. The file is nearly 2000 lines, which is way too big. I guess also the heap optimizations could go to their own file.
Carl Friedrich
How about a couple of files (preferably small) each containing a contained optimization if possible? (maybe a package?)
-- Håkan Ardö _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
-- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/
Hi, there is now a package-version of optimizeopt in the jit-bounds branch. In optimizeopt/__init__.py a chain of optimizers is created: optimizations = [OptIntBounds(), OptRewrite(), OptVirtualize(), OptHeap(), ] The opperations are passed from one optimizer to the next, which means we keep the single loop over the iterations. Each optimazation is located in it's own file, and it should be straight forward to add more optimization and even make them optional using runtime arguments if that is of interest. I believe this branch is ready to be merged now. On Thu, Sep 2, 2010 at 10:22 AM, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
On Tue, Aug 31, 2010 at 09:25, Hakan Ardo <hakan@debian.org> wrote:
Ok, so we split it up into a set of Optimization classes in separate files. Each containing a subset of the optimize_... methods. Then we have the propagate_forward method iterate over the instructions passing them to one Optimization after the other? That way we keep the single iteration over the instructions. Would it be preferable to separate them even more and have each Optimization contain it's own loop over the instructions?
But won't this affect performance? Which is very important in a JIT compiler. When compiling traces bigger than a cacheline, it might even affect locality, i.e. be an important performance problem. Unless your RPython compiler can join the loops. If they are just loops, it could. If they are tree visits, it likely can't; it's done by the Haskell compiler (google Haskell, stream fusion, shortcut deforestation, I guess), but the techniques are unlikely to generalize to languages with side effects; it's also done/doable in some Domain-Specific Languages for tree visitors.
On Sun, Aug 29, 2010 at 10:05 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Sun, Aug 29, 2010 at 2:03 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:
On 08/29/2010 01:49 PM, Hakan Ardo wrote:
P.S.: A bit unrelated, but a comment on the jit-bounds branch: I think it would be good if the bounds-related optimizations could move out of optimizeopt.py to their own file, because otherwise optimizeopt.py is getting really unwieldy. Does that make sense?
Well, class IntBound and the propagate_bounds_ methods could probably be moved elsewhere, but a lot of the work is done in optimize_... methods, which I'm not so sure it would make sens to split up.
I guess then the things that can be sanely moved should move. The file is nearly 2000 lines, which is way too big. I guess also the heap optimizations could go to their own file.
Carl Friedrich
How about a couple of files (preferably small) each containing a contained optimization if possible? (maybe a package?)
-- Håkan Ardö _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
-- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/
-- Håkan Ardö
Hi, On Sat, Aug 28, 2010 at 03:05:11PM +0200, Hakan Ardo wrote:
some time ago, there were some discussion about loop invaraints, but no conclusion.
A general answer to that question: there are two kinds of goals we can have when optimizing. One is to get the fastest possible code for small Python loops, e.g. doing numerical computations. The other is to get reasonably good code for large and complicated loops, e.g. the dispatch loop of some network application. Although loop-invariant code motion would definitely be great for the first kind of loops, it's unclear that it helps on the second kind of loops. As a similar consideration, I am thinking about trying to remove the optimization that passes "virtuals" from one iteration of the loop to the next one. Although it has good effects on small loops, it has actually a negative effect on large loops, because the loop taking virtual arguments cannot be directly jumped to from the interpreter. I'm not saying that loop-invariant code motion could also have a negative effect on large loops; I think it's a pure win, so it's probably worth a try. I'm just giving a warning: it may not help much in the case of a "general Python program doing lots of stuff", but only in the case of small numerical computation loops. A bientot, Armin.
On Sun, Aug 29, 2010 at 1:04 PM, Armin Rigo <arigo@tunes.org> wrote:
I'm not saying that loop-invariant code motion could also have a negative effect on large loops; I think it's a pure win, so it's probably worth a try. I'm just giving a warning: it may not help much in the case of a "general Python program doing lots of stuff", but only in the case of small numerical computation loops.
Right. I write a lot of numerical computation loops these days, both small and somewhat bigger, and I am typically force to write them in C to get decent performance. So the motivation here would rater be to broaden the usability of python than to improve performance of exciting python programs. Another motivation might be to help pypy developers focus on the important instruction while staring at traces, ie by hiding the instructions that will be inserted only once :) -- Håkan Ardö
On Tue, Aug 31, 2010 at 9:20 AM, Hakan Ardo <hakan@debian.org> wrote:
On Sun, Aug 29, 2010 at 1:04 PM, Armin Rigo <arigo@tunes.org> wrote:
I'm not saying that loop-invariant code motion could also have a negative effect on large loops; I think it's a pure win, so it's probably worth a try. I'm just giving a warning: it may not help much in the case of a "general Python program doing lots of stuff", but only in the case of small numerical computation loops.
Right. I write a lot of numerical computation loops these days, both small and somewhat bigger, and I am typically force to write them in C to get decent performance. So the motivation here would rater be to broaden the usability of python than to improve performance of exciting python programs.
Another motivation might be to help pypy developers focus on the important instruction while staring at traces, ie by hiding the instructions that will be inserted only once :)
I second hakan here - small loops are not uninteresting, since it broadens areas where you can use python, not limiting yourself to existing python programs.
participants (6)
-
Armin Rigo
-
Carl Friedrich Bolz
-
Hakan Ardo
-
Maciej Fijalkowski
-
Paolo Giarrusso
-
William Leslie