Virtualizable Frames getting half removed in trace
This may be a bit of a long post, but I'm trying to provide as much
information as possible. I'm attempting to work on a minimalistic Clojure
friendly VM. The bytecode is quite a bit like Python and the program I'm
testing looks something like this:
add-fn (make-code :bytecode [ADD
RETURN]
:vars []
:consts []
:locals 0
:stacksize 2)
inner-code (make-code :bytecode [STORE_LOCAL 2
PUSH_CONST, 0,
STORE_LOCAL, 0,
NO_OP, ; 6
PUSH_LOCAL, 0,
PUSH_LOCAL, 2,
EQ,
COND_JMP, 26,
PUSH_LOCAL 0
PUSH_CONST 1
PUSH_CONST 2
INVOKE 2
STORE_LOCAL, 0,
JMP, 6,
NO_OP, ;21
PUSH_LOCAL, 0,
RETURN]
:vars []
:consts [0 1 add-fn]
:stacksize 5
:locals 3)
outer-code (make-code :bytecode [PUSH_CONST, 0,
PUSH_CONST, 1,
INVOKE, 1,
RETURN]
:vars []
:consts [100000, inner-code]
:stacksize 2
:locals 0)
This program simply increments a local from 0 to 100000. When I tested this
using ADD in the inner-code, I ended up with a very tight trace. However,
when I added, add-fn the frame for inner trace ends up getting half created
at some points.
The code for the main interpreter is here:
https://bitbucket.org/halgari/clojure-vm/src/a95d278c7540cd16efb025f878c3773...
I'm attaching a copy of my latest trace. The part I'm not happy with is at
the end of the trace:
debug_merge_point(0, 0, 'INVOKE 2')
p64 = new_array(1, descr=
ugh that looks really odd, why is p67 not removed escapes my attention
On Tue, Feb 25, 2014 at 6:36 AM, Timothy Baldridge
This may be a bit of a long post, but I'm trying to provide as much information as possible. I'm attempting to work on a minimalistic Clojure friendly VM. The bytecode is quite a bit like Python and the program I'm testing looks something like this:
add-fn (make-code :bytecode [ADD RETURN] :vars [] :consts [] :locals 0 :stacksize 2) inner-code (make-code :bytecode [STORE_LOCAL 2 PUSH_CONST, 0, STORE_LOCAL, 0, NO_OP, ; 6 PUSH_LOCAL, 0, PUSH_LOCAL, 2, EQ, COND_JMP, 26, PUSH_LOCAL 0 PUSH_CONST 1 PUSH_CONST 2 INVOKE 2 STORE_LOCAL, 0, JMP, 6, NO_OP, ;21 PUSH_LOCAL, 0, RETURN] :vars [] :consts [0 1 add-fn] :stacksize 5 :locals 3) outer-code (make-code :bytecode [PUSH_CONST, 0, PUSH_CONST, 1, INVOKE, 1, RETURN] :vars [] :consts [100000, inner-code] :stacksize 2 :locals 0)
This program simply increments a local from 0 to 100000. When I tested this using ADD in the inner-code, I ended up with a very tight trace. However, when I added, add-fn the frame for inner trace ends up getting half created at some points.
The code for the main interpreter is here: https://bitbucket.org/halgari/clojure-vm/src/a95d278c7540cd16efb025f878c3773...
I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace:
debug_merge_point(0, 0, 'INVOKE 2') p64 = new_array(1, descr=
) +1165: p65 = call(ConstClass(ll_mul__GcArray_Ptr_GcStruct_objectLlT_arrayPtr_Signed), p64, 2, descr= ) +1251: guard_no_exception(descr=<Guard0x1006f66b0>) [p0, p65, p60, p6, p8, p16, p18] +1299: setarrayitem_gc(p65, 0, p60, descr= ) +1338: setarrayitem_gc(p65, 1, ConstPtr(ptr37), descr= ) p67 = new_array(2, descr= ) +1446: setarrayitem_gc(p67, i30, p60, descr= ) +1451: p68 = getarrayitem_gc(p65, 1, descr= ) +1462: setarrayitem_gc(p67, i40, p68, descr= ) debug_merge_point(1, 1, 'ADD') +1474: p69 = getarrayitem_gc(p67, i48, descr= ) +1479: setarrayitem_gc(p67, i48, ConstPtr(null), descr= ) +1488: p70 = getarrayitem_gc(p67, i52, descr= ) +1500: setarrayitem_gc(p67, i52, ConstPtr(null), descr= ) +1509: i71 = getfield_gc(p69, descr= ) +1513: i72 = getfield_gc(p70, descr= ) +1517: i73 = int_add(i71, i72) p74 = new_with_vtable(4297160080) +1531: setfield_gc(p74, i73, descr= ) +1535: setarrayitem_gc(p67, i52, p74, descr= ) debug_merge_point(1, 1, 'RETURN') +1540: p75 = getarrayitem_gc(p67, i52, descr= ) +1545: setarrayitem_gc(p67, i52, ConstPtr(null), descr= ) debug_merge_point(0, 0, 'STORE_LOCAL 0') debug_merge_point(0, 0, 'JMP 6') debug_merge_point(0, 0, 'NO_OP') +1554: jump(p0, p75, p6, p8, p16, p18, i21, i30, i40, i48, i52, descr=TargetToken(4302274768)) I'm not sure why these allocations aren't getting removed.
Any thoughts?
Thanks,
Timothy
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Hi Maciej,
On 25 February 2014 09:09, Maciej Fijalkowski
ugh that looks really odd, why is p67 not removed escapes my attention
Because we do setarrayitem and getarrayitem on non-constant indexes.
On Tue, Feb 25, 2014 at 6:36 AM, Timothy Baldridge
wrote: I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace:
We need tricks to avoid allocating the frame when we *leave* the function. In PyPy it can only be done if we know for sure that nobody can potentially grab a reference to the frame for later (e.g. via exceptions). I'm unsure to remember the latest version of this logic, but there were several ones... A bientôt, Armin.
So I spent two more hours on this this morning and finally got some good
results.
a) I turned on _immutable_ = True on the Code object. Should have done this
before.
Then I noticed that the trace contained the creation of the argument list,
but that that list was never made. The trace was also making a call out to
some C function so that it could do the array = [None] * argc. I couldn't
get that to go away even with promoting argc. So I changed pop_values to
this instead:
def pop_values(frame, argc):
if argc == 0:
return Arguments([], argc)
elif argc == 1:
return Arguments([frame.pop()], argc)
elif argc == 2:
b = frame.pop()
a = frame.pop()
return Arguments([a, b], argc)
assert False
Since Clojure only supports up to 20 positional arguments, that'll work
just fine. Now the last part of my trace consists of this:
+266: label(p0, i26, p5, p7, p15, p17, i21, i25,
descr=TargetToken(4302275472))
debug_merge_point(0, 0, 'NO_OP')
debug_merge_point(0, 0, 'PUSH_LOCAL 0')
debug_merge_point(0, 0, 'PUSH_LOCAL 2')
debug_merge_point(0, 0, 'EQ')
+280: i27 = int_eq(i21, i26)
guard_false(i27, descr=<Guard0x1006f6480>) [p0, p5, p7, p15, p17, i26]
debug_merge_point(0, 0, 'COND_JMP 26')
debug_merge_point(0, 0, 'PUSH_LOCAL 0')
debug_merge_point(0, 0, 'PUSH_CONST 1')
debug_merge_point(0, 0, 'PUSH_CONST 2')
debug_merge_point(0, 0, 'INVOKE 2')
debug_merge_point(1, 1, 'ADD')
+289: i28 = int_add(i25, i26)
debug_merge_point(1, 1, 'RETURN')
debug_merge_point(0, 0, 'STORE_LOCAL 0')
debug_merge_point(0, 0, 'JMP 6')
debug_merge_point(0, 0, 'NO_OP')
+295: jump(p0, i28, p5, p7, p15, p17, i21, i25,
descr=TargetToken(4302275472))
Which is exactly what I was looking for, an add and an eq.
Thanks for the help everyone!
Timothy
On Tue, Feb 25, 2014 at 2:56 AM, Armin Rigo
Hi Maciej,
On 25 February 2014 09:09, Maciej Fijalkowski
wrote: ugh that looks really odd, why is p67 not removed escapes my attention
Because we do setarrayitem and getarrayitem on non-constant indexes.
On Tue, Feb 25, 2014 at 6:36 AM, Timothy Baldridge
wrote: I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace:
We need tricks to avoid allocating the frame when we *leave* the function. In PyPy it can only be done if we know for sure that nobody can potentially grab a reference to the frame for later (e.g. via exceptions). I'm unsure to remember the latest version of this logic, but there were several ones...
A bientôt,
Armin.
-- "One of the main causes of the fall of the Roman Empire was that-lacking zero-they had no way to indicate successful termination of their C programs." (Robert Firth)
correction on my last email "but that list was never used"
On Tue, Feb 25, 2014 at 7:06 AM, Timothy Baldridge
So I spent two more hours on this this morning and finally got some good results.
a) I turned on _immutable_ = True on the Code object. Should have done this before.
Then I noticed that the trace contained the creation of the argument list, but that that list was never made. The trace was also making a call out to some C function so that it could do the array = [None] * argc. I couldn't get that to go away even with promoting argc. So I changed pop_values to this instead:
def pop_values(frame, argc): if argc == 0: return Arguments([], argc) elif argc == 1: return Arguments([frame.pop()], argc) elif argc == 2: b = frame.pop() a = frame.pop() return Arguments([a, b], argc) assert False
Since Clojure only supports up to 20 positional arguments, that'll work just fine. Now the last part of my trace consists of this:
+266: label(p0, i26, p5, p7, p15, p17, i21, i25, descr=TargetToken (4302275472)) debug_merge_point(0, 0, 'NO_OP') debug_merge_point(0, 0, 'PUSH_LOCAL 0') debug_merge_point(0, 0, 'PUSH_LOCAL 2') debug_merge_point(0, 0, 'EQ') +280: i27 = int_eq(i21, i26) guard_false(i27, descr=<Guard0x1006f6480>) [p0, p5, p7, p15, p17, i26] debug_merge_point(0, 0, 'COND_JMP 26') debug_merge_point(0, 0, 'PUSH_LOCAL 0') debug_merge_point(0, 0, 'PUSH_CONST 1') debug_merge_point(0, 0, 'PUSH_CONST 2') debug_merge_point(0, 0, 'INVOKE 2') debug_merge_point(1, 1, 'ADD') +289: i28 = int_add(i25, i26) debug_merge_point(1, 1, 'RETURN') debug_merge_point(0, 0, 'STORE_LOCAL 0') debug_merge_point(0, 0, 'JMP 6') debug_merge_point(0, 0, 'NO_OP') +295: jump(p0, i28, p5, p7, p15, p17, i21, i25, descr=TargetToken (4302275472))
Which is exactly what I was looking for, an add and an eq.
Thanks for the help everyone!
Timothy
On Tue, Feb 25, 2014 at 2:56 AM, Armin Rigo
wrote: Hi Maciej,
On 25 February 2014 09:09, Maciej Fijalkowski
wrote: ugh that looks really odd, why is p67 not removed escapes my attention
Because we do setarrayitem and getarrayitem on non-constant indexes.
On Tue, Feb 25, 2014 at 6:36 AM, Timothy Baldridge < tbaldridge@gmail.com> wrote:
I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace:
We need tricks to avoid allocating the frame when we *leave* the function. In PyPy it can only be done if we know for sure that nobody can potentially grab a reference to the frame for later (e.g. via exceptions). I'm unsure to remember the latest version of this logic, but there were several ones...
A bientôt,
Armin.
-- "One of the main causes of the fall of the Roman Empire was that-lacking zero-they had no way to indicate successful termination of their C programs." (Robert Firth)
-- "One of the main causes of the fall of the Roman Empire was that-lacking zero-they had no way to indicate successful termination of their C programs." (Robert Firth)
On Tue, Feb 25, 2014 at 4:06 PM, Timothy Baldridge
correction on my last email "but that list was never used"
we use the same hack in PyPy for fast argument passing, it helps in non-jit case too. (we just use it up to 5 or so)
On Tue, Feb 25, 2014 at 7:06 AM, Timothy Baldridge
wrote: So I spent two more hours on this this morning and finally got some good results.
a) I turned on _immutable_ = True on the Code object. Should have done this before.
Then I noticed that the trace contained the creation of the argument list, but that that list was never made. The trace was also making a call out to some C function so that it could do the array = [None] * argc. I couldn't get that to go away even with promoting argc. So I changed pop_values to this instead:
def pop_values(frame, argc): if argc == 0: return Arguments([], argc) elif argc == 1: return Arguments([frame.pop()], argc) elif argc == 2: b = frame.pop() a = frame.pop() return Arguments([a, b], argc) assert False
Since Clojure only supports up to 20 positional arguments, that'll work just fine. Now the last part of my trace consists of this:
+266: label(p0, i26, p5, p7, p15, p17, i21, i25, descr=TargetToken(4302275472)) debug_merge_point(0, 0, 'NO_OP') debug_merge_point(0, 0, 'PUSH_LOCAL 0') debug_merge_point(0, 0, 'PUSH_LOCAL 2') debug_merge_point(0, 0, 'EQ') +280: i27 = int_eq(i21, i26) guard_false(i27, descr=<Guard0x1006f6480>) [p0, p5, p7, p15, p17, i26] debug_merge_point(0, 0, 'COND_JMP 26') debug_merge_point(0, 0, 'PUSH_LOCAL 0') debug_merge_point(0, 0, 'PUSH_CONST 1') debug_merge_point(0, 0, 'PUSH_CONST 2') debug_merge_point(0, 0, 'INVOKE 2') debug_merge_point(1, 1, 'ADD') +289: i28 = int_add(i25, i26) debug_merge_point(1, 1, 'RETURN') debug_merge_point(0, 0, 'STORE_LOCAL 0') debug_merge_point(0, 0, 'JMP 6') debug_merge_point(0, 0, 'NO_OP') +295: jump(p0, i28, p5, p7, p15, p17, i21, i25, descr=TargetToken(4302275472))
Which is exactly what I was looking for, an add and an eq.
Thanks for the help everyone!
Timothy
On Tue, Feb 25, 2014 at 2:56 AM, Armin Rigo
wrote: Hi Maciej,
On 25 February 2014 09:09, Maciej Fijalkowski
wrote: ugh that looks really odd, why is p67 not removed escapes my attention
Because we do setarrayitem and getarrayitem on non-constant indexes.
On Tue, Feb 25, 2014 at 6:36 AM, Timothy Baldridge
wrote: I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace:
We need tricks to avoid allocating the frame when we *leave* the function. In PyPy it can only be done if we know for sure that nobody can potentially grab a reference to the frame for later (e.g. via exceptions). I'm unsure to remember the latest version of this logic, but there were several ones...
A bientôt,
Armin.
-- “One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.” (Robert Firth)
-- “One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.” (Robert Firth)
Hi Timothy,
On 25 February 2014 15:06, Timothy Baldridge
Then I noticed that the trace contained the creation of the argument list, but that that list was never made. The trace was also making a call out to some C function so that it could do the array = [None] * argc. I couldn't get that to go away even with promoting argc.
Ah, digging into it more, it seems that "[None] * argc" is not correctly optimised if argc is an unsigned number rather than a regular signed integer, like in your example. Fixed! A bientôt, Armin.
Hey,
The arrays escape because the indexes into the arrays are not constants.
p67 = new_array(2, descr=
This may be a bit of a long post, but I'm trying to provide as much information as possible. I'm attempting to work on a minimalistic Clojure friendly VM. The bytecode is quite a bit like Python and the program I'm testing looks something like this:
add-fn (make-code :bytecode [ADD RETURN] :vars [] :consts [] :locals 0 :stacksize 2) inner-code (make-code :bytecode [STORE_LOCAL 2 PUSH_CONST, 0, STORE_LOCAL, 0, NO_OP, ; 6 PUSH_LOCAL, 0, PUSH_LOCAL, 2, EQ, COND_JMP, 26, PUSH_LOCAL 0 PUSH_CONST 1 PUSH_CONST 2 INVOKE 2 STORE_LOCAL, 0, JMP, 6, NO_OP, ;21 PUSH_LOCAL, 0, RETURN] :vars [] :consts [0 1 add-fn] :stacksize 5 :locals 3) outer-code (make-code :bytecode [PUSH_CONST, 0, PUSH_CONST, 1, INVOKE, 1, RETURN] :vars [] :consts [100000, inner-code] :stacksize 2 :locals 0)
This program simply increments a local from 0 to 100000. When I tested this using ADD in the inner-code, I ended up with a very tight trace. However, when I added, add-fn the frame for inner trace ends up getting half created at some points.
The code for the main interpreter is here: https://bitbucket.org/halgari/clojure-vm/src/a95d278c7540cd16efb025f878c3773...
I'm attaching a copy of my latest trace. The part I'm not happy with is at the end of the trace:
debug_merge_point(0, 0, 'INVOKE 2') p64 = new_array(1, descr=
) +1165: p65 = call(ConstClass(ll_mul__GcArray_Ptr_GcStruct_objectLlT_arrayPtr_Signed), p64, 2, descr= ) +1251: guard_no_exception(descr=<Guard0x1006f66b0>) [p0, p65, p60, p6, p8, p16, p18] +1299: setarrayitem_gc(p65, 0, p60, descr= ) +1338: setarrayitem_gc(p65, 1, ConstPtr(ptr37), descr= ) p67 = new_array(2, descr= ) +1446: setarrayitem_gc(p67, i30, p60, descr= ) +1451: p68 = getarrayitem_gc(p65, 1, descr= ) +1462: setarrayitem_gc(p67, i40, p68, descr= ) debug_merge_point(1, 1, 'ADD') +1474: p69 = getarrayitem_gc(p67, i48, descr= ) +1479: setarrayitem_gc(p67, i48, ConstPtr(null), descr= ) +1488: p70 = getarrayitem_gc(p67, i52, descr= ) +1500: setarrayitem_gc(p67, i52, ConstPtr(null), descr= ) +1509: i71 = getfield_gc(p69, descr= ) +1513: i72 = getfield_gc(p70, descr= ) +1517: i73 = int_add(i71, i72) p74 = new_with_vtable(4297160080) +1531: setfield_gc(p74, i73, descr= ) +1535: setarrayitem_gc(p67, i52, p74, descr= ) debug_merge_point(1, 1, 'RETURN') +1540: p75 = getarrayitem_gc(p67, i52, descr= ) +1545: setarrayitem_gc(p67, i52, ConstPtr(null), descr= ) debug_merge_point(0, 0, 'STORE_LOCAL 0') debug_merge_point(0, 0, 'JMP 6') debug_merge_point(0, 0, 'NO_OP') +1554: jump(p0, p75, p6, p8, p16, p18, i21, i30, i40, i48, i52, descr=TargetToken(4302274768)) I'm not sure why these allocations aren't getting removed.
Any thoughts?
Thanks,
Timothy
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
participants (4)
-
Armin Rigo
-
Carl Friedrich Bolz
-
Maciej Fijalkowski
-
Timothy Baldridge