Virtualizable Frames getting half removed in trace
This may be a bit of a long post, but I'm trying to provide as much information as possible. I'm attempting to work on a minimalistic Clojure friendly VM. The bytecode is quite a bit like Python and the program I'm testing looks something like this: add-fn (make-code :bytecode [ADD RETURN] :vars [] :consts [] :locals 0 :stacksize 2) inner-code (make-code :bytecode [STORE_LOCAL 2 PUSH_CONST, 0, STORE_LOCAL, 0, NO_OP, ; 6 PUSH_LOCAL, 0, PUSH_LOCAL, 2, EQ, COND_JMP, 26, PUSH_LOCAL 0 PUSH_CONST 1 PUSH_CONST 2 INVOKE 2 STORE_LOCAL, 0, JMP, 6, NO_OP, ;21 PUSH_LOCAL, 0, RETURN] :vars [] :consts [0 1 add-fn] :stacksize 5 :locals 3) outer-code (make-code :bytecode [PUSH_CONST, 0, PUSH_CONST, 1, INVOKE, 1, RETURN] :vars [] :consts [100000, inner-code] :stacksize 2 :locals 0) This program simply increments a local from 0 to 100000. When I tested this using ADD in the inner-code, I ended up with a very tight trace. However, when I added, add-fn the frame for inner trace ends up getting half created at some points. The code for the main interpreter is here: https://bitbucket.org/halgari/clojure-vm/src/a95d278c7540cd16efb025f878c3773... I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace: debug_merge_point(0, 0, 'INVOKE 2') p64 = new_array(1, descr=<ArrayP 8>) +1165: p65 = call(ConstClass(ll_mul__GcArray_Ptr_GcStruct_objectLlT_arrayPtr_Signed), p64, 2, descr=<Callr 8 ri EF=4>) +1251: guard_no_exception(descr=<Guard0x1006f66b0>) [p0, p65, p60, p6, p8, p16, p18] +1299: setarrayitem_gc(p65, 0, p60, descr=<ArrayP 8>) +1338: setarrayitem_gc(p65, 1, ConstPtr(ptr37), descr=<ArrayP 8>) p67 = new_array(2, descr=<ArrayP 8>) +1446: setarrayitem_gc(p67, i30, p60, descr=<ArrayP 8>) +1451: p68 = getarrayitem_gc(p65, 1, descr=<ArrayP 8>) +1462: setarrayitem_gc(p67, i40, p68, descr=<ArrayP 8>) debug_merge_point(1, 1, 'ADD') +1474: p69 = getarrayitem_gc(p67, i48, descr=<ArrayP 8>) +1479: setarrayitem_gc(p67, i48, ConstPtr(null), descr=<ArrayP 8>) +1488: p70 = getarrayitem_gc(p67, i52, descr=<ArrayP 8>) +1500: setarrayitem_gc(p67, i52, ConstPtr(null), descr=<ArrayP 8>) +1509: i71 = getfield_gc(p69, descr=<FieldS clojure.vm.primitives.Integer.inst_int_value 8>) +1513: i72 = getfield_gc(p70, descr=<FieldS clojure.vm.primitives.Integer.inst_int_value 8>) +1517: i73 = int_add(i71, i72) p74 = new_with_vtable(4297160080) +1531: setfield_gc(p74, i73, descr=<FieldS clojure.vm.primitives.Integer.inst_int_value 8>) +1535: setarrayitem_gc(p67, i52, p74, descr=<ArrayP 8>) debug_merge_point(1, 1, 'RETURN') +1540: p75 = getarrayitem_gc(p67, i52, descr=<ArrayP 8>) +1545: setarrayitem_gc(p67, i52, ConstPtr(null), descr=<ArrayP 8>) debug_merge_point(0, 0, 'STORE_LOCAL 0') debug_merge_point(0, 0, 'JMP 6') debug_merge_point(0, 0, 'NO_OP') +1554: jump(p0, p75, p6, p8, p16, p18, i21, i30, i40, i48, i52, descr=TargetToken(4302274768)) I'm not sure why these allocations aren't getting removed. Any thoughts? Thanks, Timothy
ugh that looks really odd, why is p67 not removed escapes my attention On Tue, Feb 25, 2014 at 6:36 AM, Timothy Baldridge <tbaldridge@gmail.com> wrote:
This may be a bit of a long post, but I'm trying to provide as much information as possible. I'm attempting to work on a minimalistic Clojure friendly VM. The bytecode is quite a bit like Python and the program I'm testing looks something like this:
add-fn (make-code :bytecode [ADD RETURN] :vars [] :consts [] :locals 0 :stacksize 2) inner-code (make-code :bytecode [STORE_LOCAL 2 PUSH_CONST, 0, STORE_LOCAL, 0, NO_OP, ; 6 PUSH_LOCAL, 0, PUSH_LOCAL, 2, EQ, COND_JMP, 26, PUSH_LOCAL 0 PUSH_CONST 1 PUSH_CONST 2 INVOKE 2 STORE_LOCAL, 0, JMP, 6, NO_OP, ;21 PUSH_LOCAL, 0, RETURN] :vars [] :consts [0 1 add-fn] :stacksize 5 :locals 3) outer-code (make-code :bytecode [PUSH_CONST, 0, PUSH_CONST, 1, INVOKE, 1, RETURN] :vars [] :consts [100000, inner-code] :stacksize 2 :locals 0)
This program simply increments a local from 0 to 100000. When I tested this using ADD in the inner-code, I ended up with a very tight trace. However, when I added, add-fn the frame for inner trace ends up getting half created at some points.
The code for the main interpreter is here: https://bitbucket.org/halgari/clojure-vm/src/a95d278c7540cd16efb025f878c3773...
I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace:
debug_merge_point(0, 0, 'INVOKE 2') p64 = new_array(1, descr=<ArrayP 8>) +1165: p65 = call(ConstClass(ll_mul__GcArray_Ptr_GcStruct_objectLlT_arrayPtr_Signed), p64, 2, descr=<Callr 8 ri EF=4>) +1251: guard_no_exception(descr=<Guard0x1006f66b0>) [p0, p65, p60, p6, p8, p16, p18] +1299: setarrayitem_gc(p65, 0, p60, descr=<ArrayP 8>) +1338: setarrayitem_gc(p65, 1, ConstPtr(ptr37), descr=<ArrayP 8>) p67 = new_array(2, descr=<ArrayP 8>) +1446: setarrayitem_gc(p67, i30, p60, descr=<ArrayP 8>) +1451: p68 = getarrayitem_gc(p65, 1, descr=<ArrayP 8>) +1462: setarrayitem_gc(p67, i40, p68, descr=<ArrayP 8>) debug_merge_point(1, 1, 'ADD') +1474: p69 = getarrayitem_gc(p67, i48, descr=<ArrayP 8>) +1479: setarrayitem_gc(p67, i48, ConstPtr(null), descr=<ArrayP 8>) +1488: p70 = getarrayitem_gc(p67, i52, descr=<ArrayP 8>) +1500: setarrayitem_gc(p67, i52, ConstPtr(null), descr=<ArrayP 8>) +1509: i71 = getfield_gc(p69, descr=<FieldS clojure.vm.primitives.Integer.inst_int_value 8>) +1513: i72 = getfield_gc(p70, descr=<FieldS clojure.vm.primitives.Integer.inst_int_value 8>) +1517: i73 = int_add(i71, i72) p74 = new_with_vtable(4297160080) +1531: setfield_gc(p74, i73, descr=<FieldS clojure.vm.primitives.Integer.inst_int_value 8>) +1535: setarrayitem_gc(p67, i52, p74, descr=<ArrayP 8>) debug_merge_point(1, 1, 'RETURN') +1540: p75 = getarrayitem_gc(p67, i52, descr=<ArrayP 8>) +1545: setarrayitem_gc(p67, i52, ConstPtr(null), descr=<ArrayP 8>) debug_merge_point(0, 0, 'STORE_LOCAL 0') debug_merge_point(0, 0, 'JMP 6') debug_merge_point(0, 0, 'NO_OP') +1554: jump(p0, p75, p6, p8, p16, p18, i21, i30, i40, i48, i52, descr=TargetToken(4302274768))
I'm not sure why these allocations aren't getting removed.
Any thoughts?
Thanks,
Timothy
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Hi Maciej, On 25 February 2014 09:09, Maciej Fijalkowski <fijall@gmail.com> wrote:
ugh that looks really odd, why is p67 not removed escapes my attention
Because we do setarrayitem and getarrayitem on non-constant indexes.
On Tue, Feb 25, 2014 at 6:36 AM, Timothy Baldridge <tbaldridge@gmail.com> wrote:
I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace:
We need tricks to avoid allocating the frame when we *leave* the function. In PyPy it can only be done if we know for sure that nobody can potentially grab a reference to the frame for later (e.g. via exceptions). I'm unsure to remember the latest version of this logic, but there were several ones... A bientôt, Armin.
So I spent two more hours on this this morning and finally got some good results. a) I turned on _immutable_ = True on the Code object. Should have done this before. Then I noticed that the trace contained the creation of the argument list, but that that list was never made. The trace was also making a call out to some C function so that it could do the array = [None] * argc. I couldn't get that to go away even with promoting argc. So I changed pop_values to this instead: def pop_values(frame, argc): if argc == 0: return Arguments([], argc) elif argc == 1: return Arguments([frame.pop()], argc) elif argc == 2: b = frame.pop() a = frame.pop() return Arguments([a, b], argc) assert False Since Clojure only supports up to 20 positional arguments, that'll work just fine. Now the last part of my trace consists of this: +266: label(p0, i26, p5, p7, p15, p17, i21, i25, descr=TargetToken(4302275472)) debug_merge_point(0, 0, 'NO_OP') debug_merge_point(0, 0, 'PUSH_LOCAL 0') debug_merge_point(0, 0, 'PUSH_LOCAL 2') debug_merge_point(0, 0, 'EQ') +280: i27 = int_eq(i21, i26) guard_false(i27, descr=<Guard0x1006f6480>) [p0, p5, p7, p15, p17, i26] debug_merge_point(0, 0, 'COND_JMP 26') debug_merge_point(0, 0, 'PUSH_LOCAL 0') debug_merge_point(0, 0, 'PUSH_CONST 1') debug_merge_point(0, 0, 'PUSH_CONST 2') debug_merge_point(0, 0, 'INVOKE 2') debug_merge_point(1, 1, 'ADD') +289: i28 = int_add(i25, i26) debug_merge_point(1, 1, 'RETURN') debug_merge_point(0, 0, 'STORE_LOCAL 0') debug_merge_point(0, 0, 'JMP 6') debug_merge_point(0, 0, 'NO_OP') +295: jump(p0, i28, p5, p7, p15, p17, i21, i25, descr=TargetToken(4302275472)) Which is exactly what I was looking for, an add and an eq. Thanks for the help everyone! Timothy On Tue, Feb 25, 2014 at 2:56 AM, Armin Rigo <arigo@tunes.org> wrote:
Hi Maciej,
On 25 February 2014 09:09, Maciej Fijalkowski <fijall@gmail.com> wrote:
ugh that looks really odd, why is p67 not removed escapes my attention
Because we do setarrayitem and getarrayitem on non-constant indexes.
On Tue, Feb 25, 2014 at 6:36 AM, Timothy Baldridge <tbaldridge@gmail.com> wrote:
I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace:
We need tricks to avoid allocating the frame when we *leave* the function. In PyPy it can only be done if we know for sure that nobody can potentially grab a reference to the frame for later (e.g. via exceptions). I'm unsure to remember the latest version of this logic, but there were several ones...
A bientôt,
Armin.
-- "One of the main causes of the fall of the Roman Empire was that-lacking zero-they had no way to indicate successful termination of their C programs." (Robert Firth)
correction on my last email "but that list was never used" On Tue, Feb 25, 2014 at 7:06 AM, Timothy Baldridge <tbaldridge@gmail.com>wrote:
So I spent two more hours on this this morning and finally got some good results.
a) I turned on _immutable_ = True on the Code object. Should have done this before.
Then I noticed that the trace contained the creation of the argument list, but that that list was never made. The trace was also making a call out to some C function so that it could do the array = [None] * argc. I couldn't get that to go away even with promoting argc. So I changed pop_values to this instead:
def pop_values(frame, argc): if argc == 0: return Arguments([], argc) elif argc == 1: return Arguments([frame.pop()], argc) elif argc == 2: b = frame.pop() a = frame.pop() return Arguments([a, b], argc) assert False
Since Clojure only supports up to 20 positional arguments, that'll work just fine. Now the last part of my trace consists of this:
+266: label(p0, i26, p5, p7, p15, p17, i21, i25, descr=TargetToken (4302275472)) debug_merge_point(0, 0, 'NO_OP') debug_merge_point(0, 0, 'PUSH_LOCAL 0') debug_merge_point(0, 0, 'PUSH_LOCAL 2') debug_merge_point(0, 0, 'EQ') +280: i27 = int_eq(i21, i26) guard_false(i27, descr=<Guard0x1006f6480>) [p0, p5, p7, p15, p17, i26] debug_merge_point(0, 0, 'COND_JMP 26') debug_merge_point(0, 0, 'PUSH_LOCAL 0') debug_merge_point(0, 0, 'PUSH_CONST 1') debug_merge_point(0, 0, 'PUSH_CONST 2') debug_merge_point(0, 0, 'INVOKE 2') debug_merge_point(1, 1, 'ADD') +289: i28 = int_add(i25, i26) debug_merge_point(1, 1, 'RETURN') debug_merge_point(0, 0, 'STORE_LOCAL 0') debug_merge_point(0, 0, 'JMP 6') debug_merge_point(0, 0, 'NO_OP') +295: jump(p0, i28, p5, p7, p15, p17, i21, i25, descr=TargetToken (4302275472))
Which is exactly what I was looking for, an add and an eq.
Thanks for the help everyone!
Timothy
On Tue, Feb 25, 2014 at 2:56 AM, Armin Rigo <arigo@tunes.org> wrote:
Hi Maciej,
On 25 February 2014 09:09, Maciej Fijalkowski <fijall@gmail.com> wrote:
ugh that looks really odd, why is p67 not removed escapes my attention
Because we do setarrayitem and getarrayitem on non-constant indexes.
On Tue, Feb 25, 2014 at 6:36 AM, Timothy Baldridge < tbaldridge@gmail.com> wrote:
I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace:
We need tricks to avoid allocating the frame when we *leave* the function. In PyPy it can only be done if we know for sure that nobody can potentially grab a reference to the frame for later (e.g. via exceptions). I'm unsure to remember the latest version of this logic, but there were several ones...
A bientôt,
Armin.
-- "One of the main causes of the fall of the Roman Empire was that-lacking zero-they had no way to indicate successful termination of their C programs." (Robert Firth)
-- "One of the main causes of the fall of the Roman Empire was that-lacking zero-they had no way to indicate successful termination of their C programs." (Robert Firth)
On Tue, Feb 25, 2014 at 4:06 PM, Timothy Baldridge <tbaldridge@gmail.com> wrote:
correction on my last email "but that list was never used"
we use the same hack in PyPy for fast argument passing, it helps in non-jit case too. (we just use it up to 5 or so)
On Tue, Feb 25, 2014 at 7:06 AM, Timothy Baldridge <tbaldridge@gmail.com> wrote:
So I spent two more hours on this this morning and finally got some good results.
a) I turned on _immutable_ = True on the Code object. Should have done this before.
Then I noticed that the trace contained the creation of the argument list, but that that list was never made. The trace was also making a call out to some C function so that it could do the array = [None] * argc. I couldn't get that to go away even with promoting argc. So I changed pop_values to this instead:
def pop_values(frame, argc): if argc == 0: return Arguments([], argc) elif argc == 1: return Arguments([frame.pop()], argc) elif argc == 2: b = frame.pop() a = frame.pop() return Arguments([a, b], argc) assert False
Since Clojure only supports up to 20 positional arguments, that'll work just fine. Now the last part of my trace consists of this:
+266: label(p0, i26, p5, p7, p15, p17, i21, i25, descr=TargetToken(4302275472)) debug_merge_point(0, 0, 'NO_OP') debug_merge_point(0, 0, 'PUSH_LOCAL 0') debug_merge_point(0, 0, 'PUSH_LOCAL 2') debug_merge_point(0, 0, 'EQ') +280: i27 = int_eq(i21, i26) guard_false(i27, descr=<Guard0x1006f6480>) [p0, p5, p7, p15, p17, i26] debug_merge_point(0, 0, 'COND_JMP 26') debug_merge_point(0, 0, 'PUSH_LOCAL 0') debug_merge_point(0, 0, 'PUSH_CONST 1') debug_merge_point(0, 0, 'PUSH_CONST 2') debug_merge_point(0, 0, 'INVOKE 2') debug_merge_point(1, 1, 'ADD') +289: i28 = int_add(i25, i26) debug_merge_point(1, 1, 'RETURN') debug_merge_point(0, 0, 'STORE_LOCAL 0') debug_merge_point(0, 0, 'JMP 6') debug_merge_point(0, 0, 'NO_OP') +295: jump(p0, i28, p5, p7, p15, p17, i21, i25, descr=TargetToken(4302275472))
Which is exactly what I was looking for, an add and an eq.
Thanks for the help everyone!
Timothy
On Tue, Feb 25, 2014 at 2:56 AM, Armin Rigo <arigo@tunes.org> wrote:
Hi Maciej,
On 25 February 2014 09:09, Maciej Fijalkowski <fijall@gmail.com> wrote:
ugh that looks really odd, why is p67 not removed escapes my attention
Because we do setarrayitem and getarrayitem on non-constant indexes.
On Tue, Feb 25, 2014 at 6:36 AM, Timothy Baldridge <tbaldridge@gmail.com> wrote:
I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace:
We need tricks to avoid allocating the frame when we *leave* the function. In PyPy it can only be done if we know for sure that nobody can potentially grab a reference to the frame for later (e.g. via exceptions). I'm unsure to remember the latest version of this logic, but there were several ones...
A bientôt,
Armin.
-- “One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.” (Robert Firth)
-- “One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.” (Robert Firth)
Hi Timothy, On 25 February 2014 15:06, Timothy Baldridge <tbaldridge@gmail.com> wrote:
Then I noticed that the trace contained the creation of the argument list, but that that list was never made. The trace was also making a call out to some C function so that it could do the array = [None] * argc. I couldn't get that to go away even with promoting argc.
Ah, digging into it more, it seems that "[None] * argc" is not correctly optimised if argc is an unsigned number rather than a regular signed integer, like in your example. Fixed! A bientôt, Armin.
Hey, The arrays escape because the indexes into the arrays are not constants. p67 = new_array(2, descr=<ArrayP 8>) +1446: setarrayitem_gc(p67, i30, p60, descr=<ArrayP 8>) Here, should i30 be always the same value? If yes, you should promote it before the array access. I couldn't figure out what p67 is, whether it's the stack or the arguments array, but if it's the stack, this might mean changing pop as follows: def pop(self): depth = jit.promote(self.tos) - 1 val = self.data[depth] self.data[depth] = None self.tos = depth return val Cheers, Carl Friedrich On 25/02/14 06:36, Timothy Baldridge wrote:
This may be a bit of a long post, but I'm trying to provide as much information as possible. I'm attempting to work on a minimalistic Clojure friendly VM. The bytecode is quite a bit like Python and the program I'm testing looks something like this:
add-fn (make-code :bytecode [ADD RETURN] :vars [] :consts [] :locals 0 :stacksize 2) inner-code (make-code :bytecode [STORE_LOCAL 2 PUSH_CONST, 0, STORE_LOCAL, 0, NO_OP, ; 6 PUSH_LOCAL, 0, PUSH_LOCAL, 2, EQ, COND_JMP, 26, PUSH_LOCAL 0 PUSH_CONST 1 PUSH_CONST 2 INVOKE 2 STORE_LOCAL, 0, JMP, 6, NO_OP, ;21 PUSH_LOCAL, 0, RETURN] :vars [] :consts [0 1 add-fn] :stacksize 5 :locals 3) outer-code (make-code :bytecode [PUSH_CONST, 0, PUSH_CONST, 1, INVOKE, 1, RETURN] :vars [] :consts [100000, inner-code] :stacksize 2 :locals 0)
This program simply increments a local from 0 to 100000. When I tested this using ADD in the inner-code, I ended up with a very tight trace. However, when I added, add-fn the frame for inner trace ends up getting half created at some points.
The code for the main interpreter is here: https://bitbucket.org/halgari/clojure-vm/src/a95d278c7540cd16efb025f878c3773...
I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace:
debug_merge_point(0, 0, 'INVOKE 2') p64 = new_array(1, descr=<ArrayP 8>) +1165: p65 = call(ConstClass(ll_mul__GcArray_Ptr_GcStruct_objectLlT_arrayPtr_Signed), p64, 2, descr=<Callr 8 ri EF=4>) +1251: guard_no_exception(descr=<Guard0x1006f66b0>) [p0, p65, p60, p6, p8, p16, p18] +1299: setarrayitem_gc(p65, 0, p60, descr=<ArrayP 8>) +1338: setarrayitem_gc(p65, 1, ConstPtr(ptr37), descr=<ArrayP 8>) p67 = new_array(2, descr=<ArrayP 8>) +1446: setarrayitem_gc(p67, i30, p60, descr=<ArrayP 8>) +1451: p68 = getarrayitem_gc(p65, 1, descr=<ArrayP 8>) +1462: setarrayitem_gc(p67, i40, p68, descr=<ArrayP 8>) debug_merge_point(1, 1, 'ADD') +1474: p69 = getarrayitem_gc(p67, i48, descr=<ArrayP 8>) +1479: setarrayitem_gc(p67, i48, ConstPtr(null), descr=<ArrayP 8>) +1488: p70 = getarrayitem_gc(p67, i52, descr=<ArrayP 8>) +1500: setarrayitem_gc(p67, i52, ConstPtr(null), descr=<ArrayP 8>) +1509: i71 = getfield_gc(p69, descr=<FieldS clojure.vm.primitives.Integer.inst_int_value 8>) +1513: i72 = getfield_gc(p70, descr=<FieldS clojure.vm.primitives.Integer.inst_int_value 8>) +1517: i73 = int_add(i71, i72) p74 = new_with_vtable(4297160080) +1531: setfield_gc(p74, i73, descr=<FieldS clojure.vm.primitives.Integer.inst_int_value 8>) +1535: setarrayitem_gc(p67, i52, p74, descr=<ArrayP 8>) debug_merge_point(1, 1, 'RETURN') +1540: p75 = getarrayitem_gc(p67, i52, descr=<ArrayP 8>) +1545: setarrayitem_gc(p67, i52, ConstPtr(null), descr=<ArrayP 8>) debug_merge_point(0, 0, 'STORE_LOCAL 0') debug_merge_point(0, 0, 'JMP 6') debug_merge_point(0, 0, 'NO_OP') +1554: jump(p0, p75, p6, p8, p16, p18, i21, i30, i40, i48, i52, descr=TargetToken(4302274768))
I'm not sure why these allocations aren't getting removed.
Any thoughts?
Thanks,
Timothy
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
participants (4)
-
Armin Rigo -
Carl Friedrich Bolz -
Maciej Fijalkowski -
Timothy Baldridge