[pypy-dev] Object pinning

William ML Leslie william.leslie.ttg at gmail.com
Thu Dec 22 21:57:06 EST 2016


On 22 December 2016 at 15:56, Kunshan Wang <kunshan.wang at anu.edu.au> wrote:
> Hi folks,

Hi Kunshan!

>
> I have a question regarding object pinning in RPython.
>
> Consider the following snippet from rpython/rtyper/lltypesystem/rstr.py
>
>     @jit.oopspec('stroruni.copy_contents(src, dst, srcstart, dststart,
> length)')
>     @signature(types.any(), types.any(), types.int(), types.int(),
> types.int(), returns=types.none())
>     def copy_string_contents(src, dst, srcstart, dststart, length):
>         """Copies 'length' characters from the 'src' string to the 'dst'
>         string, starting at position 'srcstart' and 'dststart'."""
>         # xxx Warning: don't try to do this at home.  It relies on a lot
>         # of details to be sure that it works correctly in all cases.
>         # Notably: no GC operation at all from the first cast_ptr_to_adr()
>         # because it might move the strings.  The keepalive_until_here()
>         # are obscurely essential to make sure that the strings stay alive
>         # longer than the raw_memcopy().
>         assert length >= 0
>         ll_assert(srcstart >= 0, "copystrc: negative srcstart")
>         ll_assert(srcstart + length <= len(src.chars), "copystrc: src ovf")
>         ll_assert(dststart >= 0, "copystrc: negative dststart")
>         ll_assert(dststart + length <= len(dst.chars), "copystrc: dst ovf")
>         # from here, no GC operations can happen
>         asrc = _get_raw_buf(SRC_TP, src, srcstart)
>         adst = _get_raw_buf(DST_TP, dst, dststart)
>         llmemory.raw_memcopy(asrc, adst, llmemory.sizeof(CHAR_TP) * length)
>         # end of "no GC" section
>         keepalive_until_here(src)
>         keepalive_until_here(dst)
>     copy_string_contents._always_inline_ = True
>     copy_string_contents = func_with_new_name(copy_string_contents,
>                                               'copy_%s_contents' % name)
>
> There is a region where heap objects in the RPython heap is accessed
> externally by native programs.  I understand that GC must neither
> recycle the object nor move it in the memory.  But I have two questions
> about how object pinning is done in RPython:
>
> (1) From the perspective of the RPython user (e.g. high-level language
> implementer, interpreter writer, library writer, ...), what is the
> "protocol" to follow when interacting with native programs (such as
> passing a buffer to the `read` syscall)?  I have seen idiomatic use of
> `cast_ptr_to_adr` followed by `keepalive_until_here`.  But there is also
> `pin` and `unpin` functions in the rpython/rlib/rgc.py module.  What is
> the expected way for *the user* to pin objects for native access?
>

The general practice so far has been not to expose
interpreter-allocated objects to native code at all, but to give
native code either a raw malloc'd handle (a-la
https://bitbucket.org/pypy/pypy/src/29df4aac463be1c3508b47754014844edb765bfa/pypy/module/cpyext/pyobject.py?at=default&fileviewer=file-view-default#pyobject.py-52
) or require the native side to do the allocation (eg
https://bitbucket.org/pypy/pypy/src/29df4aac463be1c3508b47754014844edb765bfa/pypy/module/_cffi_backend/cbuffer.py?at=default&fileviewer=file-view-default
).  The aim was to significantly reduce the amount of memory that
needed pinning.

To the extent that there is an interface, it's the rpython.rlib.rgc
module.  A fair bit of lltypesystem code makes use of the assumption
that no other code is touching interpreter objects and the
implementation detail that if no heap memory is allocated in some
dynamic extent, the GC cannot run and move objects.  Seems a bit
disquieting.

TAN: you might have better primitives for dealing with strings in your
target, in which case avoiding much of the lltypesystem might be a
better deal for you.

> (2) From the perspective of the RPython developer (those who develop the
> translation from RTyped CFGs to C source code, assembly code and machine
> code), how does the C backend actually enforce the no-GC policy between
> "from here, no GC operations can happen" and "end of 'no GC' section"?
> As I observed, keepalive_until_here essentially generates a no-op inline
> assembly that "uses" the variable so that the C compiler keeps that
> variable alive.  But what is preventing GC from happening?
>

There are no instructions that may allocate within that dynamic
section.  It's not machine verified at all.  It could be.

>
> Some background: We are building a new backend for RPython on the Mu
> micro virtual machine (https://gitlab.anu.edu.au/mu/mu-client-pypy).
> This VM has built-in GC and exception handling, so they don't need to be
> injected after RTyper.  But the micro VM also keeps the representation
> of object references opaque.  The only way to "cast" references to
> addresses is using the "object pinning" operation which returns its
> address.  The idiom in Mu when passing a buffer to native programs is
> that you "pin" the object, give the pointer to the native functions
> (such as `memcpy`, `read` and `write`), and then "unpin" it (nested
> pinning is allowed, and it needs to be unpinned as many times as it was
> pinned).  The GC will neither reclaim the object nor move it when it is
> pinned, but object pinning does not prevent GC from happening: GC can
> still move other objects, but not the pinned ones.
>

Sounds sensible.

We have enough people playing with pypy in Au now that we should
probably all meet at some point (:

> So the crux is how to translate the RPython primitives into the Mu
> counterparts.  If `cast_ptr_to_adr` and `keep_alive_until_here` is a
> well-obeyed idiom, we can simply translate them to `pin` and `unpin`,
> respectively.
>
> The problem is, it only works for non-GC types.  Mu cannot copy
> references in this way.  There are several problems: (1) Mu does not
> want to expose the byte-by-byte representation of references.  So
> references may not really be addresses, and may not be copied naively as
> a word (it could contain bit flags, too).  (2) Mu does not prevent the
> movement of other object. So if an array of references is pinned, the
> objects its elements point *to* may still be moved.  I guess the reason
> why the snippet above prevents *all* kinds of GC activity is because it
> may be used to copy GC-managed object references, too.  If this is the
> case, we have to find alternative solutions.
>

Right, we have the same concern in pypy.  See cffi and cpyext for how
we hand such references to external libraries.

-- 
William Leslie

Notice:
Likely much of this email is, by the nature of copyright, covered
under copyright law.  You absolutely MAY reproduce any part of it in
accordance with the copyright law of the nation you are reading this
in.  Any attempt to DENY YOU THOSE RIGHTS would be illegal without
prior contractual agreement.


More information about the pypy-dev mailing list