vtable: non-GC allocation unit used only within RPython programs
Hi all, I am working on porting RPython to the Mu micro virtual machine (http://microvm.org/). I have a question regarding the translation of the vtables of RPython objects. Background: Mu is a micro virtual machine. Its type system is similar to the level of the LL type system. Mu has a garbage collector and a stack unwinder built in, so LL-typed RPython programs can be straightforwardly translated to the Mu intermediate representation (IR) without injecting exception handling and GC. Mu has a garbage-collected heap and a permanent static space, both are traced by GC (i.e. the GC can identify all reference fields in them). Unlike RPython where only GcStruct, GcArray and their derivatives are GC-ed, any type can be allocated in the GC heap (Yes, you can allocate a single int64 in the heap if it makes sense to your language.). The GC maintains its own metadata (invisible to the Mu client) to perform GC for any types. But Mu is also minimal: Mu only allocates exactly what fields the client tells Mu to allocate, but not more. Particularly, Mu does not provide any RTTI to its client (think about the C language with reference types and GC). Therefore, OOP language implementers have to implement their own vtables and add vtable reference fields to object headers. Unlike RPython where the traced-ness of Ptr<T> is determined by whether T is GC-ed or not, Mu has distinct untraced pointers (uptr<T>) which are just raw addresses, object references (ref<T>) which refer to heap objects, and internal references (iref<T>) which is a more powerful (but harder to implement) reference type that can refer to a field of either a heap object or a static variable. Mu can build 'boot images'. A boot image is an executable image which contains a preserved heap in addition to executable code. It is similar to the executable program the RPython C backend generates, but preserved heap objects can still be garbage-collected as usual --- they are not C-level static variables. Problem: When translating the LL type Ptr<T> to the Mu counterpart, our current approach is: 1. If T is a GcStruct or a GcArray, translate it to ref<T>, 2. otherwise, translate it to uptr<T>. Ideally, object references are used inside Mu (all RPython programs are translated to Mu IR), and uptr<T> are only used to interact with external C programs --- uptr<T> are just addresses. Most RPython programs seem to follow this pattern: GcStructs are used within the RPython program, and non-GC Structs can be passed to external C programs by Ptr. But vtables appear to be a special case (see rpython/rtyper/rclass.py:160): 1. It is a Struct, but not GcStruct. So it is not allocated in the GC heap. 2. But it is only used internally for RPython. It is never exposed to native C programs. It is a valid approach to implement vtables as non-GC objects, because the number of RPython types is determined at compile time, and there are only as many vtables as GC-ed types. But I think our current translation strategy --- translating RPython Ptr<vtable> to Mu uptr<vtable> --- is problematic: this link only exists within Mu programs from GC objects to their respective vtables. It shouldn't be translated to uptr<T> which should only be used for the native interface. Furthermore, Mu restricts the T of uptr<T> to be untraced types. vtables contain many function references, which are a kind of traced reference in Mu. So referring to vtable by uptr<vtable> will not work unless we relax the restriction to allow accessing traced reference fields using uptr, which is undesirable because uptr is specifically designed to access native C data, not Mu objects. Currently, we allocate all vtables as static variables (still traced by the GC). So, an alternative to uptr<T> is iref<T> which is a traced reference type, and can still be used to refer to static variables. Now, we see there are three kinds of RPython-level Ptr<T> values: 1. Those pointing to GC objects. 2. Those pointing to non-GC objects which can be accessed by native programs (e.g. the byte buffers to be read by the `write` system call when printing) 3. Those pointing to non-GC internal objects within RPython (e.g. vtable) The problem is: there seems to be not enough static type information to distinguish between case 2 and case 3. If we could distinguish between 2 and 3, we could translate case 2 to uptr<T> and case 3 to iref<T>. But RPython seems to be using Ptr for these two purposes interchangeably. Therefore, the Mu backend will have to tell whether an RPython-level Ptr<T> field is ever exposed to native programs or not, which can only be decided at run time. Understandably, using the C backend, both vtables and C-accessible objects are represented as C-level global variables of struct types. Therefore it was not necessary to make such distinction, because both are accessed by C-level pointers after translation. So I would like to know: Is vtable the only non-GC Struct type that is used internally within an RPython program (instead of exposed to native C programs)? If it is, we can make a special case, and translate Ptr<OBJECT_VTABLE> to iref<OBJECT_VTABLE>. But the ideal way to implement vtables is translating them to GC objects. In this way, vtables can be managed just like any other GC objects. Since Mu do not automatically add RTTI to heap objects, Mu can still allocate vtables on the GC heap even though vtable is not an RttiStruct. Regards, Kunshan Wang Australian National University
Hi, On 8 February 2017 at 11:07, Kunshan Wang <kunshan.wang@anu.edu.au> wrote:
1. Those pointing to GC objects. 2. Those pointing to non-GC objects which can be accessed by native programs (e.g. the byte buffers to be read by the `write` system call when printing) 3. Those pointing to non-GC internal objects within RPython (e.g. vtable)
The problem is: there seems to be not enough static type information to distinguish between case 2 and case 3.
To summarize, the native model of PyPy breaks because you can't reference GC objects from non-GC data, whereas this occurs commonly inside PyPy. But I'm more concerned about function pointers: you say this is GC data too in Mu? How can you then pass a function pointer, pointing to a callback, to external C code? To distinguish between case 2 and case 3, you could simply look inside the Struct at whether it contains references to GC objects or not. This is not perfect but it should work well enough (and can be tweaked manually if necessary in one corner case or two). A bientôt, Armin.
participants (2)
-
Armin Rigo
-
Kunshan Wang