[pypy-dev] Separate compilation and friends

William ML Leslie william.leslie.ttg at gmail.com
Wed Feb 16 13:24:01 CET 2011


On 15 February 2011 12:13, Maciej Fijalkowski <fijall at gmail.com> wrote:
> Hi.
>
> There is growing interest about PyPy and especially about extension
> modules. Apparently there are some people (like Alex) that are willing
> to write modules in RPython that should not go to the main tree. Since
> separate compilation is considered hard, how hard would it be to
> provide separate loading? This would mean you still compile the whole
> interpreter, but you can load the module from a compiled PyPy that
> didn't have this option on.

0. What do you do about linking? Are rpython class and function names
mangled consistently enough that a module compiled against one
patch-level runtime with one set of options will link against another
patch version with some different options?

1. Is it reasonable to ensure that *all* symbols that may be visible
to an extension module are exported by the runtime?  f may be inlined
into g, but if f may still be callable by an extension module, it must
be available by being exported static, too.

2. I always assumed this ("separate loading", ie, common translation)
was the intended way to do separate compilation in rpython *anyway*;
but there is one nagging thing.  In order to define the boundary of
the compilation, you already need to declare the interface of the
module.  Maybe not the annotations of the arguments, but at least
which functions, when the annotator sees them, should be placed in
your ".so".

For the specific case of modules in the pypy python runtime this can
be the interpleveldefs attribute of Module objects, but such functions
all seem to have known signature annotation; they take a space and a
number of wrapped arguments and return a wrapped object.  Armed with
this and the signature annotations of the functions in the core
runtime, it is reasonable to expect that we can determine which
functions belong in the ".so".  Exactly how those core annotations are
obtained doesn't need to be set in stone - obtained within the same
translation or annotation performed earlier are both reasonable and
not incompatible places to start.

In both cases, the annotation is *computed* in the usual way.  The
alternative is defining the interface exported by the runtime
explicitly, which is a mammoth task, and I don't think anyone has
suggested it: is that what you are arguing against, fijal?

There are still some details, such as how you export JitCodes from
functions in your module, and what it means to do so - but nothing
prohibitive that I can see.

On 15 February 2011 19:32, Paolo Giarrusso
<pgiarrusso at mathematik.uni-marburg.de> wrote:
> On Tue, Feb 15, 2011 at 09:17, Dima Tisnek <dimaqq at gmail.com> wrote:
>> I assume here that modules don't introduce dependencies into the iterpreter.
>> I guess in the long run this ought to be the case, right?
>
> I don't think you can guarantee this. Type inference is global, and
> you might need a user for each API to better infer its type. Maybe
> uses of an API in testcases allow fully inferring their types, but I'd
> guess not.

If this does happen, it makes the callee part of a public interface,
which should probably be explicitly annotated.

I think checking for this case (where annotation in the extension
would generalise the type signature of a dependency) is not too
difficult.

> However, what is true in general is that if less specific types are
> inferred, that affects just performance, not correctness (I don't know
> if that's true of PyPy, but you ought to be able to pass "object"s
> around). Maybe the slowdown is insignificant, maybe it is a huge
> problem, maybe few annotations can save the day.

Annotation widens, rather than narrows: overly specific types are more
likely to be inferred.

> However, it is still not clear (to me) where previous efforts stopped.
> Is it hard to:
> 1) devise an algorithm like Dima proposed
> or to
> 2) implement it (because of too much code to change and limited manpower)
> or to
> 3) or to have a small performance loss?

I understood that there was a lot of design to be done and there were
other priorities.  Devising an algorithm is not difficult, specifying
it in terms of our existing annotation & flow model is slightly more
so.

> Per-file separate compilation would likely fall into 3), because too
> little type inference would happen, isn't it?

I suspect you are thinking of accidentally boxing interp-level
integers or something, but that is an impossible condition (the
dreaded SomeObject annotation).  If not, where do you think a loss
would come from?

-- 
William Leslie



More information about the Pypy-dev mailing list