[pypy-dev] Question about extension support

Benjamin Peterson benjamin at python.org
Thu Mar 27 06:05:28 CET 2014



On Wed, Mar 26, 2014, at 21:51, Kevin Modzelewski wrote:
> On Wed, Mar 26, 2014 at 9:32 PM, Benjamin Peterson
> <benjamin at python.org>wrote:
> 
> >
> >
> > On Wed, Mar 26, 2014, at 21:17, Kevin Modzelewski wrote:
> > > On Wed, Mar 26, 2014 at 1:52 PM, Benjamin Peterson
> > > <benjamin at python.org>wrote:
> > >
> > > >
> > > > There are several reasons. Two of the most important are
> > > > 1) PyPy's internal representation of objects is different from
> > > > CPython's, so a conversion cost must be payed every time objects pass
> > > > between pure Python and C. Unlike CPython, extensions with PyPy can't
> > > > poke around directly in data structures. Macros like PyList_SET_ITEM
> > > > have to become function calls.
> > > >
> > >
> > > Hmm interesting... I'm not sure I follow, though, why the calling
> > > PyList_SET_ITEM on a PyPy list can't know about the PyPy object
> > > representation.  Again, I understand how it's not necessarily going to be
> > > as fast as pure-python code, but I don't understand why PyList_SET_ITEM
> > > on
> > > PyPy needs to be slower than on CPython.  Is it because PyPy uses more
> > > complicated internal representations, expecting the overhead to be elided
> > > by the JIT?
> >
> > Let's continue with the list example. pypy lists use an array as the
> > underlying data structure like CPython, but the similarity stops there.
> > You can't just have random C code putting things in pypy lists. The
> > internal representation of the list might be unwrapped integers, not
> > points to int objects like CPython lists. There also needs to be GC
> > barriers.
> >
> > The larger picture is that building a robust CPython compatibility layer
> > is difficult and error-prone compared to the solution of rewriting C
> > extensions in Python (possibly with cffi).
> >
> 
> Using that logic, I would counter that building a JIT for a dynamic
> language is difficult and error-prone compared to rewriting your dynamic
> language programs in a faster language :)  The benefit to supporting it
> in
> your runtime is 1) you only do the work once, and 2) you get to support
> existing code out there.

I don't want to argue that an amazing fast CPython API compatibility
isn't possible, but current experience suggests that creating it will be
painful. It's hard to get excited about building compatibility layers
when there are shiny JITs to be made.

> 
> I'm writing not from the standpoint of saying "I have an extension module
> and I want it to run quickly", but rather "what do you guys think about
> the
> (presumed) situation of extension modules being a key blocker of PyPy
> adoption".  While I'd love the world to migrate to a better solution
> overnight, I don't think that's realistic -- just look at the state of
> Python 3, which has a much larger constituency pushing much harder for
> it,
> and presumably has lower switching costs than rewriting C extensions in
> Python.

Yes, but you get to use PyPy and get super fast Python code, whereas you
code gets no faster by porting to Python 3. Plus you get rid of C! The
incentives are a bit better.

> 
> 
> > >
> > > Also, I'm assuming that CPyExt gets to do a recompilation of the
> > > extension
> > > module;
> >
> > Yes
> >
> > > 2) Bridging the gap between PyPy's GC and CPython's ref counting
> > >
> > > requires a lot of bookkeeping.
> > > >
> > >
> > > From a personal standpoint I'm also curious about how much of this
> > > overhead
> > > is fundamental, and how much could be alleviated with (potentially
> > > significant) implementation effort.  I know PyPy has a precise GC, but I
> > > wonder if using a conservative GC could change the situation dramatically
> > > if you were able to hook the extension module's allocator and switch it
> > > to
> > > using the conservative GC.  That's my plan, at least, which is one of the
> > > reasons I've been curious about the issues that PyPy has been running
> > > into
> > > since I'm curious about how much will be applicable.
> >
> > Conservative GCs are evil and slow. :)
> >
> > I don't know what you mean by the "extension module's allocator". That's
> > a fairly global thing.
> >
> 
> I'm assuming that you can hook out malloc and mmap to be calls to the GC
> allocator; I've seen other projects do this, though I don't know how
> robust
> it is.

That's the easy part. The hard part is keeping your precise GC informed
of native C doing arbitrary things.


More information about the pypy-dev mailing list