Mailman 3 vtable: non-GC allocation unit used only within RPython programs - pypy-dev

8 Feb 2017

      Hi all,

I am working on porting RPython to the Mu micro virtual machine
(http://microvm.org/). I have a question regarding the translation of
the vtables of RPython objects.

Background:

Mu is a micro virtual machine.  Its type system is similar to the level
of the LL type system.  Mu has a garbage collector and a stack unwinder
built in, so LL-typed RPython programs can be straightforwardly
translated to the Mu intermediate representation (IR) without injecting
exception handling and GC.

Mu has a garbage-collected heap and a permanent static space, both are
traced by GC (i.e. the GC can identify all reference fields in them).
Unlike RPython where only GcStruct, GcArray and their derivatives are
GC-ed, any type can be allocated in the GC heap (Yes, you can allocate a
single int64 in the heap if it makes sense to your language.).  The GC
maintains its own metadata (invisible to the Mu client) to perform GC
for any types.

But Mu is also minimal: Mu only allocates exactly what fields the client
tells Mu to allocate, but not more.  Particularly, Mu does not provide
any RTTI to its client (think about the C language with reference types
and GC).  Therefore, OOP language implementers have to implement their
own vtables and add vtable reference fields to object headers.

Unlike RPython where the traced-ness of Ptr<T> is determined by whether
T is GC-ed or not, Mu has distinct untraced pointers (uptr<T>) which are
just raw addresses, object references (ref<T>) which refer to heap
objects, and internal references (iref<T>) which is a more powerful (but
harder to implement) reference type that can refer to a field of either
a heap object or a static variable.

Mu can build 'boot images'.  A boot image is an executable image which
contains a preserved heap in addition to executable code.  It is similar
to the executable program the RPython C backend generates, but preserved
heap objects can still be garbage-collected as usual --- they are not
C-level static variables.

Problem:

When translating the LL type Ptr<T> to the Mu counterpart, our current
approach is:

  1. If T is a GcStruct or a GcArray, translate it to ref<T>,
  2. otherwise, translate it to uptr<T>.

Ideally, object references are used inside Mu (all RPython programs are
translated to Mu IR), and uptr<T> are only used to interact with
external C programs --- uptr<T> are just addresses.  Most RPython
programs seem to follow this pattern: GcStructs are used within the
RPython program, and non-GC Structs can be passed to external C programs
by Ptr.

But vtables appear to be a special case (see rpython/rtyper/rclass.py:160):

  1. It is a Struct, but not GcStruct.  So it is not allocated in the GC
heap.
  2. But it is only used internally for RPython.  It is never exposed to
native C programs.

It is a valid approach to implement vtables as non-GC objects, because
the number of RPython types is determined at compile time, and there are
only as many vtables as GC-ed types.

But I think our current translation strategy --- translating RPython
Ptr<vtable> to Mu uptr<vtable> --- is problematic:  this link only
exists within Mu programs from GC objects to their respective vtables.
It shouldn't be translated to uptr<T> which should only be used for the
native interface.  Furthermore, Mu restricts the T of uptr<T> to be
untraced types.  vtables contain many function references, which are a
kind of traced reference in Mu.  So referring to vtable by uptr<vtable>
will not work unless we relax the restriction to allow accessing traced
reference fields using uptr, which is undesirable because uptr is
specifically designed to access native C data, not Mu objects.

Currently, we allocate all vtables as static variables (still traced by
the GC).  So, an alternative to uptr<T> is iref<T> which is a traced
reference type, and can still be used to refer to static variables.

Now, we see there are three kinds of RPython-level Ptr<T> values:

  1. Those pointing to GC objects.
  2. Those pointing to non-GC objects which can be accessed by native
programs (e.g. the byte buffers to be read by the `write` system call
when printing)
  3. Those pointing to non-GC internal objects within RPython (e.g. vtable)

The problem is:  there seems to be not enough static type information to
distinguish between case 2 and case 3.

If we could distinguish between 2 and 3, we could translate case 2 to
uptr<T> and case 3 to iref<T>.  But RPython seems to be using Ptr for
these two purposes interchangeably.  Therefore, the Mu backend will have
to tell whether an RPython-level Ptr<T> field is ever exposed to native
programs or not, which can only be decided at run time.

Understandably, using the C backend, both vtables and C-accessible
objects are represented as C-level global variables of struct types.
Therefore it was not necessary to make such distinction, because both
are accessed by C-level pointers after translation.

So I would like to know: Is vtable the only non-GC Struct type that is
used internally within an RPython program (instead of exposed to native
C programs)?

If it is, we can make a special case, and translate Ptr
to iref.  But the ideal way to implement vtables is
translating them to GC objects.  In this way, vtables can be managed
just like any other GC objects.  Since Mu do not automatically add RTTI
to heap objects, Mu can still allocate vtables on the GC heap even
though vtable is not an RttiStruct.

Regards,
Kunshan Wang
Australian National University

vtable: non-GC allocation unit used only within RPython programs

Kunshan Wang

tags

participants (2)