Virtualizables in RPython

Dear developers, I am currently working on my Masters thesis about optimizations for SPy (Squeak VM written using the RPython toolchain). I am planning to devote one or two chapters to the issue of virtualizable objects in a VM like SPy (and also in SPy in particular). Since there is not much documentaion on virtualizables available, I would appreciate your input. I want to collect details about the underlying concept, and also about the specific implementation in the RPython JIT. For example, was this concept first introduced in Pypy, or is it an older idea? How exactly does the optimizer decide which objects can be virtualized, and which can not? I would also appreciate pointers to relevant parts of the Pypy source code, which is probably the best documentation as of now. Thank you and best regards, Anton

Hi Anton, On 24 April 2014 12:38, Anton Gulenko <anton.gulenko@student.hpi.uni-potsdam.de> wrote:
We first need to clarify some details. Are you talking about "virtualizables", or "virtuals"? These are two very different concepts. The latter is a very strong form of escape analysis, and probably the single most efficient concept we have in the RPython JIT. The former, on the other hand, is more of a hack that was added at some point in time, and that we're trying to remove --- unsuccessfully so far, because it gives slighty better results than virtuals only, for the specific use case where it works. I'm unsure which one you're talking about: only the frame object is a "virtualizable", and it's not the job of the optimizer to decide that; it's explicitly marked in the jitdriver. See https://pypy.readthedocs.org/en/latest/jit/virtualizable.html . On the other hand, if you're talking about "virtuals", then indeed we have code in the optimizer to handle them. It's some kind of escape analysis leading to allocation removal. A bientôt, Armin.

Dear Armin, thank you for your reply! I was not aware of the distinction between virtuals and virtualizables. It's a good start to have that covered ;) I thought marking frame objects as virtualizables was a necessary input for the optimizer, because it cannot decide everything on its own. Since you are calling virtualizables a hack, it seems the optimizer SHOULD be able to handle frame objects without additional hints. Could you explain why it is not able to do so? My original question was referring to virtuals, but I thought that frame objects were part of that. Now that I know of the distinction, I would like to understand both concepts in detail, and how they interact. I'm not sure how much of this I can ask you to describe to me - probably I should just read and debug the according code? Any suggestions? The motivation for this exercise is that the Spy VM is producing JIT traces that I (we) did not understand. For example, changes in the VM that seemed totally unrelated, were breaking the "virtualizablility" of frame objects (or parts of them), like by setting the program counter on each loop execution (inside the trace!). I'm not sure if this was unwanted behavior on behalf of the optimizer, but it seemed pretty non-deterministic, and I would like to understand the mechanism well enough to troubleshoot and optimize these traces. Best, Anton 2014-04-26 9:10 GMT+02:00 Armin Rigo <arigo@tunes.org>:

Maybe it's useful to add that even without any changes to the SPy VM, frame objects in loops are nicely virtualized in the tests (where they are executing at a stack depth of 1), but already if we execute the loop at a stack depth of 4 or 5, some frame attributes (but only some, like the PC) suddenly appeared in the trace and were written all over... Frames failed to virtualize entirely at a stack depth of ~100 Part of these questions is that we're trying to understand why this occurs. cheers, -Tim On 28 April 2014 14:30, Anton Gulenko <anton.gulenko@student.hpi.uni-potsdam.de> wrote:

Hi Tim, On 29 April 2014 13:22, Tim Felgentreff <timfelgentreff@gmail.com> wrote:
We might again talk past each other. What I believe I understand behind what you're saying: you're looking at the outermost frame (the one in which the looping occurs). This frame is not a virtualizable (because you don't have any), and is also not a virtual (because it already exists before the loop). However, in simple examples, the optimizer is able to remove all writes of the PC to it. But in larger examples, these writes show up again. Is that correct? This is expected so far. The writes to a normal, pre-existing object (the frame) can be delayed but only partly. I think that Topaz uses a hack for now to force the write of the PC to be delayed more: instead of writing the PC as an integer, it writes it as a small object containing an integer --- every time a new object. This is really a hack but it should solve the problem for now. You should discuss with Alex Gaynor if this hack turned out to work in the long run or if he eventually had to give it up and replace it with something else (like virtualizables). If your original question is precisely that you did all this already and want to learn more about virtualizables, then I'm sorry :-) Reading your original questions again seem to show some confusion (which might be on my side too). In two words, a virtualizable object is not virtual at all; it's just an object in which some specific fields are specially marked as "delayed", like the PC --- which can then be a regular integer, without the need for the small container object hack described above. A bientôt, Armin.

Hi Armin, We might again talk past each other. What I believe I understand
I'll try to make the example that Tim mentioned more clear. Building up the deep stack was done INSIDE the loop. It was also the only thing that happened inside the loop. That's why we expected the traces for deep and shallow stacks to be very similar - shouldn't the optimizer simply eliminate the additional frame objects? Also, the relevant fields in the frame objects are indeed marked as virtualizable in the SPy VM. The smalltalk code was basically this, with a varying argument to buildStack: buildStack: depth depth <= 0 ifTrue: [ ^ nil ]. self buildStack: depth - 1 100000 timesRepeat: [ self buildStack: 100 ] Do you have any thoughts regarding this example? In two words, a virtualizable object
is not virtual at all; it's just an object in which some specific fields are specially marked as "delayed", like the PC
Ok, so virtuals and virtualizables are two unrelated mechanisms? How does the optimizer decide which writes to virtualizable fields it is able to eliminate? Is that based on a similar kind of analysis like the escape analysis for virtual objects? Thanks, best regards, Anton 2014-04-30 11:13 GMT+02:00 Armin Rigo <arigo@tunes.org>:

On Wed, Apr 30, 2014 at 1:11 PM, Anton Gulenko < anton.gulenko@student.hpi.uni-potsdam.de> wrote:
just a wild guess: it might be possible that in this example the trace becomes too long, so tracing aborts and the function is marked as "trace from start"? In that case, the function call cannot be inlined and needs to be turned into a real assembler recursive call, which means that the frame cannot be a virtual because it needs to be passed as an argument.

Hi Anton, On 30 April 2014 13:11, Anton Gulenko <anton.gulenko@student.hpi.uni-potsdam.de> wrote:
It's not decided by the optimizer. A virtualizable structure has got a static list of fields to handle (in _virtualizable_ = [...]). When we enter the JITted machine code, we have exactly one virtualizable (the outermost frame), and we read the current value of these fields in local variables (i.e. registers or machine stack locations). When we leave the machine code again, we write back the local variables into the fields. In the meantime, these fields have an outdated value. This only concerns the single outermost frame. If there are more frame objects created by the frame, their status as "virtualizable" is completely ignored, and instead we simply rely on the malloc-removal optimization called "virtuals". What exactly occurs when one of these other frames escapes (following the theory of Antonio) depends on yet another detail. If each frame has a normal "back" field, then escaping any frame will force all frames below it to escape as well, which will force them all. In PyPy we solve this by using "virtualrefs", which is yet another concept, unrelated to the two previous ones. Each frame has a "f_back" field which is not a normal reference to the previous frame, but "jit.virtual_ref(previous_frame)". This works a bit like a weakref, except that it's not weak --- instead, it allows whatever is referenced to remain virtual even if the small virtual_ref object itself escapes, based on the assumption that most of the time we'll not access the real object. A bientôt, Armin.

Hi Anton, On 24 April 2014 12:38, Anton Gulenko <anton.gulenko@student.hpi.uni-potsdam.de> wrote:
We first need to clarify some details. Are you talking about "virtualizables", or "virtuals"? These are two very different concepts. The latter is a very strong form of escape analysis, and probably the single most efficient concept we have in the RPython JIT. The former, on the other hand, is more of a hack that was added at some point in time, and that we're trying to remove --- unsuccessfully so far, because it gives slighty better results than virtuals only, for the specific use case where it works. I'm unsure which one you're talking about: only the frame object is a "virtualizable", and it's not the job of the optimizer to decide that; it's explicitly marked in the jitdriver. See https://pypy.readthedocs.org/en/latest/jit/virtualizable.html . On the other hand, if you're talking about "virtuals", then indeed we have code in the optimizer to handle them. It's some kind of escape analysis leading to allocation removal. A bientôt, Armin.

Dear Armin, thank you for your reply! I was not aware of the distinction between virtuals and virtualizables. It's a good start to have that covered ;) I thought marking frame objects as virtualizables was a necessary input for the optimizer, because it cannot decide everything on its own. Since you are calling virtualizables a hack, it seems the optimizer SHOULD be able to handle frame objects without additional hints. Could you explain why it is not able to do so? My original question was referring to virtuals, but I thought that frame objects were part of that. Now that I know of the distinction, I would like to understand both concepts in detail, and how they interact. I'm not sure how much of this I can ask you to describe to me - probably I should just read and debug the according code? Any suggestions? The motivation for this exercise is that the Spy VM is producing JIT traces that I (we) did not understand. For example, changes in the VM that seemed totally unrelated, were breaking the "virtualizablility" of frame objects (or parts of them), like by setting the program counter on each loop execution (inside the trace!). I'm not sure if this was unwanted behavior on behalf of the optimizer, but it seemed pretty non-deterministic, and I would like to understand the mechanism well enough to troubleshoot and optimize these traces. Best, Anton 2014-04-26 9:10 GMT+02:00 Armin Rigo <arigo@tunes.org>:

Maybe it's useful to add that even without any changes to the SPy VM, frame objects in loops are nicely virtualized in the tests (where they are executing at a stack depth of 1), but already if we execute the loop at a stack depth of 4 or 5, some frame attributes (but only some, like the PC) suddenly appeared in the trace and were written all over... Frames failed to virtualize entirely at a stack depth of ~100 Part of these questions is that we're trying to understand why this occurs. cheers, -Tim On 28 April 2014 14:30, Anton Gulenko <anton.gulenko@student.hpi.uni-potsdam.de> wrote:

Hi Tim, On 29 April 2014 13:22, Tim Felgentreff <timfelgentreff@gmail.com> wrote:
We might again talk past each other. What I believe I understand behind what you're saying: you're looking at the outermost frame (the one in which the looping occurs). This frame is not a virtualizable (because you don't have any), and is also not a virtual (because it already exists before the loop). However, in simple examples, the optimizer is able to remove all writes of the PC to it. But in larger examples, these writes show up again. Is that correct? This is expected so far. The writes to a normal, pre-existing object (the frame) can be delayed but only partly. I think that Topaz uses a hack for now to force the write of the PC to be delayed more: instead of writing the PC as an integer, it writes it as a small object containing an integer --- every time a new object. This is really a hack but it should solve the problem for now. You should discuss with Alex Gaynor if this hack turned out to work in the long run or if he eventually had to give it up and replace it with something else (like virtualizables). If your original question is precisely that you did all this already and want to learn more about virtualizables, then I'm sorry :-) Reading your original questions again seem to show some confusion (which might be on my side too). In two words, a virtualizable object is not virtual at all; it's just an object in which some specific fields are specially marked as "delayed", like the PC --- which can then be a regular integer, without the need for the small container object hack described above. A bientôt, Armin.

Hi Armin, We might again talk past each other. What I believe I understand
I'll try to make the example that Tim mentioned more clear. Building up the deep stack was done INSIDE the loop. It was also the only thing that happened inside the loop. That's why we expected the traces for deep and shallow stacks to be very similar - shouldn't the optimizer simply eliminate the additional frame objects? Also, the relevant fields in the frame objects are indeed marked as virtualizable in the SPy VM. The smalltalk code was basically this, with a varying argument to buildStack: buildStack: depth depth <= 0 ifTrue: [ ^ nil ]. self buildStack: depth - 1 100000 timesRepeat: [ self buildStack: 100 ] Do you have any thoughts regarding this example? In two words, a virtualizable object
is not virtual at all; it's just an object in which some specific fields are specially marked as "delayed", like the PC
Ok, so virtuals and virtualizables are two unrelated mechanisms? How does the optimizer decide which writes to virtualizable fields it is able to eliminate? Is that based on a similar kind of analysis like the escape analysis for virtual objects? Thanks, best regards, Anton 2014-04-30 11:13 GMT+02:00 Armin Rigo <arigo@tunes.org>:

On Wed, Apr 30, 2014 at 1:11 PM, Anton Gulenko < anton.gulenko@student.hpi.uni-potsdam.de> wrote:
just a wild guess: it might be possible that in this example the trace becomes too long, so tracing aborts and the function is marked as "trace from start"? In that case, the function call cannot be inlined and needs to be turned into a real assembler recursive call, which means that the frame cannot be a virtual because it needs to be passed as an argument.

Hi Anton, On 30 April 2014 13:11, Anton Gulenko <anton.gulenko@student.hpi.uni-potsdam.de> wrote:
It's not decided by the optimizer. A virtualizable structure has got a static list of fields to handle (in _virtualizable_ = [...]). When we enter the JITted machine code, we have exactly one virtualizable (the outermost frame), and we read the current value of these fields in local variables (i.e. registers or machine stack locations). When we leave the machine code again, we write back the local variables into the fields. In the meantime, these fields have an outdated value. This only concerns the single outermost frame. If there are more frame objects created by the frame, their status as "virtualizable" is completely ignored, and instead we simply rely on the malloc-removal optimization called "virtuals". What exactly occurs when one of these other frames escapes (following the theory of Antonio) depends on yet another detail. If each frame has a normal "back" field, then escaping any frame will force all frames below it to escape as well, which will force them all. In PyPy we solve this by using "virtualrefs", which is yet another concept, unrelated to the two previous ones. Each frame has a "f_back" field which is not a normal reference to the previous frame, but "jit.virtual_ref(previous_frame)". This works a bit like a weakref, except that it's not weak --- instead, it allows whatever is referenced to remain virtual even if the small virtual_ref object itself escapes, based on the assumption that most of the time we'll not access the real object. A bientôt, Armin.
participants (4)
-
Anton Gulenko
-
Antonio Cuni
-
Armin Rigo
-
Tim Felgentreff