[pypy-dev] Automated binding generation (and maintenance)

wlavrijsen at lbl.gov wlavrijsen at lbl.gov
Fri Sep 15 02:43:28 EDT 2017


> Ah, I had not realised rootcling existed. I've seen that I can invoke
> it using Python version-specific paths...is this the correct way to
> invoke it:
> ROOTCLING=/usr/local/lib/python3.6/dist-packages/cppyy_backend

Yes, and here's a description of the LinkDef.h format:


> or is there a recommended wrapper?

No, but I'm going to add one for pip, same as I did for genreflex. I've
been fleshing out the backend generation, taken over from Anto:


where all that can live. I'm told that I'll need rootcling anyway for
use of modules (see below).

> I actually get some warnings and then the error:

Add this set of exclusions to the selection.xml:

    <class pattern="*thread_mutex*" />
    <class pattern="*new_allocator*" />
    <class pattern="*Alloc_hider*" />

Of course, the larger problem of pulling in these standard libs over and
over again is that it is a waste of cpu and memory, so I do want to see
the file_name attribute fixed. As it stands, I'd simply exclude:

    <class pattern="std::*" />
    <class pattern="__gnu_cxx::*" />

especially since they are already available by default. Note that those two
rules cover the ones needed for new_allocator and Alloc_hider.

However, there is a more efficient approach that is right around the corner
(and has been right about the corner for a long time, so don't hold me to
that). Next release now seems likely though.

The long term goal has always been to use modules:


but the original drivers (Apple, Google, and the C++ standards committee)
have been going back and forth on it. Now, things are finally falling into
place. Here's Google:


And here's ROOT:


The big deal is that C++ developers have an incentive to deploy modules, so
being able to patch into that should be a huge time saver (and where they
don't, rootcling will soon be able to create modules from headers). Note
that modules don't come for free: it will require some ambiguity resolution,
but that is typically a Good Thing (code-quality wise).

Modules allow deserialization of only the piece of the AST that is actually
being requested, saving memory. This as opposed to header files (whether or
not precompiled) which pull in everything before them. See the status report
above for the improvements in memory usage.

And with modules, of course, selection becomes unnecessary (markup for
automatic streamers may still be useful, but that is not relevant for
bindings generation).

> I did wonder if I was missing some "-isystem" includes, and tried
> adding them but the --debug output from genreflex seemed to suggest
> they were being ignored.

Some flags are ignored as no-one was using them (so far). Some others
are definitely obsolete by now.

> What is interesting, and might possibly throw light on the selection
> filter issue, is that the file name for the classes in
> kjsinterpreter.h itself is always the empty string ''. Classes that
> come from included files return non-empty strings such as
> 'kjsobject.h' for 'KJSObject'.

That's after the fact (i.e. what is stored); I don't see the rule being
respected/used at all.

> BTW, the reason for doing this is that lots of KDE code has multiple
> classes and even namespaces in a single header file. Now, for
> discoverability of the loaded objects, I find the incremental "pop
> into cppyy,gbl on demand" somewhat limiting and I wanted to play about
> with that. I could also workaround the filter issue if I precomputed
> the needed names in a precursor pass.

The issue here is the memory cost of loading things that won't get used
in the end. This is why a functional dir() (which needs nothing but
strings, after all), in conjunction with lazy loading/creation when a
real access happens work well. LLVM is fully lookup based, btw. There
is a custom layer on top of Cling to make enumeration possible.

> Finally, and most importantly given the fidelity with which cppyy
> renders the C++ code, I'm think about how Pythonisation customisation
> might be handled: e.g. a Python wrapper layer to allow a
> pointer-plus-size to render as a Python list/tuple, or generate a dict
> mapping fora QSet, and so on. (I'm dimly aware of the
> boost-recognition logic you have alluded to, this is specifically more
> about Qt-specific patterns and ad-hoc scenarios).

In 2015, a GSoC student fleshed this out. I never put it into PyPy b/c of
a lack of test coverage, but did put in in PyROOT. Here's an example of
the "pointer-plus-size" pythonization (from ROOT.py):

     # python side pythonizations (should live in their own file, if we get many)
       def set_size(self, buf):
          return buf

     # TODO: add pythonization API to pypy-c
             cppyy.compose_method("^TGraph(2D)?$|^TGraph.*Errors$", "GetE?[XYZ]$", set_size))

The functions selected by the regexps return naked pointers, but the object
can be queried for the size (all have a consistent GetN() function). So the
method composer patches up the return value, making it a sized array,
instead of an "open-ended" one.

I'm sitting on some patches as I wanted to tweak his APIs a bit. There
was some ordering that I felt didn't compose well, but that is minor.

Similarly, there's code to apply ownership rules, mapping exceptions,
the new C++11 smartptrs, controlling auto-casting, handling the GIL, making
properties, and adding overloads. All driven by regexp matching of patterns.
See here:


(plus further support inside the bindings layer itself).

Of course, one can hook up completely custom functions, and he made it so
that that is per C++ namespace, so nicely self-contained.

Again, this is currently only partly available, as I need to write a lot
more tests for PyPy (which are bound to unearth some problems along the
way). And then there is documentation to be written ...

> P.S. Please note that after today, I'll likely not have much Internet
> access for a couple of weeks, so any responses may be limited.

I'll make sure I have at least all my local changes pushed by then. :)

Best regards,
WLavrijsen at lbl.gov    --    +1 (510) 486 6411    --    www.lavrijsen.net

More information about the pypy-dev mailing list