Automated binding generation (and maintenance)
Hi, I have been looking at the problem of automated binding generation (and maintenance) for large C++ code bases for a little while now [1], but am new to cppyy. One issue I am struggling to find a good solution for is to generate an accurate list of the objects (classes, functions, variables etc) in a given header file in order to populate the selection .XML. Ideally, I'd like to be able to say "all objects in this translation unit". I tried the wildcard "*" but I believe that selects the transitive fanout (and runs into errors). Short of running Clang directly to generate the names, what options do I have? (Currently, I'm working around this by manually specifying a narrower wildcard such as "KJS*"). Also, as I looked around for approaches to this issue, I noted that the cppyy backend v610 5 source code has a "getAllClasses" whereas PyPI has 6.10.0.2. I'm not sure of the mapping of versions, but what is the cadence for updates to PyPI? [1] https://marc.info/?l=kde-core-devel&m=150464598710128&w=2
[ Resend now I am subscribed, apologies for any duplication ] Hi, I have been looking at the problem of automated binding generation (and maintenance) for large C++ code bases for a little while now [1], but am new to cppyy. One issue I am struggling to find a good solution for is to generate an accurate list of the objects (classes, functions, variables etc) in a given header file in order to populate the selection .XML. Ideally, I'd like to be able to say "all objects in this translation unit". I tried the wildcard "*" but I believe that selects the transitive fanout (and runs into errors). Short of running Clang directly to generate the names, what options do I have? (Currently, I'm working around this by manually specifying a narrower wildcard such as "KJS*"). Also, as I looked around for approaches to this issue, I noted that the cppyy backend v610 5 source code has a "getAllClasses" whereas PyPI has 6.10.0.2. I'm not sure of the mapping of versions, but what is the cadence for updates to PyPI? [1] https://marc.info/?l=kde-core-devel&m=150464598710128&w=2
Shaheed,
One issue I am struggling to find a good solution for is to generate an accurate list of the objects (classes, functions, variables etc) in a given header file in order to populate the selection .XML.
that option exists, but apparently no-one has ever used it, as it is clearly broken. :P It should be: <selection> <class pattern="*" file_name="SomeHeader.h" /> <enum pattern="*" file_name="SomeHeader.h" /> <function pattern="*" file_name="SomeHeader.h" /> <variable pattern="*" file_name="SomeHeader.h" /> </selection> genreflex exists for backwards compatibility, underneath it's rootcling, which accepts this: #pragma link C++ defined_in "SomeHeader.h"; and that does work ... I'll dig a bit, see what goes wrong with genreflex; should be no more than proper rule registration. But if not restricting selection, what errors are you seeing?
Short of running Clang directly to generate the names, what options do I have?
If using PyPy (not yet CPython), you can load all files in a header, include that, and simply start looping over dir(cppyy.gbl). (This is one of a set of things that I still have to equalize between PyPy/cppyy and CPython/cppyy.)
Also, as I looked around for approaches to this issue, I noted that the cppyy backend v610 5 source code has a "getAllClasses" whereas PyPI has 6.10.0.2.
That getAllClasses was a hack for compatibility reasons that doesn't do what the name supposes it does: there can always be more classes that could be found through a mapping file, but haven't yet. Hence a functional dir() is a better approach.
I'm not sure of the mapping of versions, but what is the cadence for updates to PyPI?
It's only since a few months that I split everything off into a standalone package (there's a reason the first version digit is still 0) and I'm still sitting on some restructuring to separate things that update often from things that don't. The backend part is expected to update every half year or so, once packaging stabilizes (that's the cling schedule).
[1] https://marc.info/?l=kde-core-devel&m=150464598710128&w=2
Just a few minor points in response to that message. E.g. yes, overloads end up as a single Python function, but if you don't want that, then you can use __disp__("signature") to pick out the ones you want. Those are first-class objects, and allow any kind of restructuring that Python allows. As for needing cling, that's only if you need the dynamic features. It is also possible to use it to generate bindings to be used for cffi. You need to pre-instantiate templates and such, but that's already the case for any other bindings tool. And for that matter, at that level you could use it to generate what you need for SIP, too. Best regards, Wim -- WLavrijsen@lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net
Hi Wim, On 14 September 2017 at 01:04, <wlavrijsen@lbl.gov> wrote:
Shaheed,
One issue I am struggling to find a good solution for is to generate an accurate list of the objects (classes, functions, variables etc) in a given header file in order to populate the selection .XML.
that option exists, but apparently no-one has ever used it, as it is clearly broken. :P It should be:
<selection> <class pattern="*" file_name="SomeHeader.h" /> <enum pattern="*" file_name="SomeHeader.h" /> <function pattern="*" file_name="SomeHeader.h" /> <variable pattern="*" file_name="SomeHeader.h" /> </selection>
genreflex exists for backwards compatibility, underneath it's rootcling, which accepts this:
#pragma link C++ defined_in "SomeHeader.h";
Ah, I had not realised rootcling existed. I've seen that I can invoke it using Python version-specific paths...is this the correct way to invoke it: ROOTCLING=/usr/local/lib/python3.6/dist-packages/cppyy_backend LD_LIBRARY_PATH=$ROOTCLING/lib $ROOTCLING/bin/rootcling -h or is there a recommended wrapper?
and that does work ... I'll dig a bit, see what goes wrong with genreflex; should be no more than proper rule registration. But if not restricting selection, what errors are you seeing?
::_Alloc_hider was selected but its dictionary cannot be generated:
::_Alloc_hider instances will be possible. Warning: Class or struct basic_string<char32_t,char_traits<char32_t>,allocator<char32_t> ::_Alloc_hider was selected but its dictionary cannot be generated:
::_Alloc_hider instances will be possible. Warning: Class or struct basic_string<_CharT,_Traits,_Alloc>::_Alloc_hider was selected but its dictionary cannot be generated: this is a private or protected class and this is not supported. No direct I/O operation of basic_string<_CharT,_Traits,_Alloc>::_Alloc_hider instances will be
::_Alloc_hider was selected but its dictionary cannot be generated:
With this: ====== <selection> <class pattern="*" /> <function pattern="*" /> <variable pattern="*" /> <enum pattern="*" /> </selection> ====== I actually get some warnings and then the error: ====== Warning: Class or struct basic_string<char16_t,char_traits<char16_t>,allocator<char16_t> this is a private or protected class and this is not supported. No direct I/O operation of basic_string<char16_t,char_traits<char16_t>,allocator<char16_t> this is a private or protected class and this is not supported. No direct I/O operation of basic_string<char32_t,char_traits<char32_t>,allocator<char32_t> possible. Warning: Class or struct string::_Alloc_hider was selected but its dictionary cannot be generated: this is a private or protected class and this is not supported. No direct I/O operation of string::_Alloc_hider instances will be possible. Warning: Class or struct basic_string<wchar_t,char_traits<wchar_t>,allocator<wchar_t> this is a private or protected class and this is not supported. No direct I/O operation of basic_string<wchar_t,char_traits<wchar_t>,allocator<wchar_t>
::_Alloc_hider instances will be possible. Warning: Class or struct ios_base::_Callback_list was selected but its dictionary cannot be generated: this is a private or protected class and this is not supported. No direct I/O operation of ios_base::_Callback_list instances will be possible. Warning: Class or struct ios_base::_Words was selected but its dictionary cannot be generated: this is a private or protected class and this is not supported. No direct I/O operation of ios_base::_Words instances will be possible. Error in <CloseStreamerInfoROOTFile>: Cannot find class __pthread_mutex_s. ======
The command line in use is: ====== genreflex /usr/include/KF5/kjs/kjsinterpreter.h -s selection.xml -o tmp3/kjsinterpreter.cpp -I/usr/include/x86_64-linux-gnu/qt5 -I/usr/include/x86_64-linux-gnu/qt5/QtCore -I/usr/include/KF5/kjs -I/usr/include/KF5/wtf ====== I did wonder if I was missing some "-isystem" includes, and tried adding them but the --debug output from genreflex seemed to suggest they were being ignored.
Short of running Clang directly to generate the names, what options do I have?
If using PyPy (not yet CPython), you can load all files in a header, include that, and simply start looping over dir(cppyy.gbl). (This is one of a set of things that I still have to equalize between PyPy/cppyy and CPython/cppyy.)
Also, as I looked around for approaches to this issue, I noted that the cppyy backend v610 5 source code has a "getAllClasses" whereas PyPI has 6.10.0.2.
That getAllClasses was a hack for compatibility reasons that doesn't do what the name supposes it does: there can always be more classes that could be found through a mapping file, but haven't yet. Hence a functional dir() is a better approach.
Ack. My driver code is exactly intended to handle this kind of thing by walking the directories and invoking genreflex/rootcling. One issue is that I've been experimenting with directly using cppyy.gbl.gROOT et. al. to try to identify only the classes (and later variables etc) directly in kjsinterprter.h by looking at cppyy.gbl.gInterpreter.ClassInfo_FileName() for the relevant class name with something roughly like this: ci = cppyy.gbl.gInterpreter.ClassInfo_Factory('KJSInterpreter') cppyy.gbl.gInterpreter.ClassInfo_FileName(ci) What is interesting, and might possibly throw light on the selection filter issue, is that the file name for the classes in kjsinterpreter.h itself is always the empty string ''. Classes that come from included files return non-empty strings such as 'kjsobject.h' for 'KJSObject'. BTW, the reason for doing this is that lots of KDE code has multiple classes and even namespaces in a single header file. Now, for discoverability of the loaded objects, I find the incremental "pop into cppyy,gbl on demand" somewhat limiting and I wanted to play about with that. I could also workaround the filter issue if I precomputed the needed names in a precursor pass. Finally, and most importantly given the fidelity with which cppyy renders the C++ code, I'm think about how Pythonisation customisation might be handled: e.g. a Python wrapper layer to allow a pointer-plus-size to render as a Python list/tuple, or generate a dict mapping fora QSet, and so on. (I'm dimly aware of the boost-recognition logic you have alluded to, this is specifically more about Qt-specific patterns and ad-hoc scenarios).
I'm not sure of the mapping of versions, but what is the cadence for updates to PyPI?
It's only since a few months that I split everything off into a standalone package (there's a reason the first version digit is still 0) and I'm still sitting on some restructuring to separate things that update often from things that don't. The backend part is expected to update every half year or so, once packaging stabilizes (that's the cling schedule).
[1] https://marc.info/?l=kde-core-devel&m=150464598710128&w=2
Just a few minor points in response to that message. E.g. yes, overloads end up as a single Python function, but if you don't want that, then you can use __disp__("signature") to pick out the ones you want. Those are first-class objects, and allow any kind of restructuring that Python allows.
As for needing cling, that's only if you need the dynamic features. It is also possible to use it to generate bindings to be used for cffi. You need to pre-instantiate templates and such, but that's already the case for any other bindings tool. And for that matter, at that level you could use it to generate what you need for SIP, too.
Thanks for the kind hints, but you've only managed to whet my appetite to get cppyy working as it is exactly things like the handling of overloads and template instantiation that I want most! Thanks, Shaheed P.S. Please note that after today, I'll likely not have much Internet access for a couple of weeks, so any responses may be limited.
Best regards, Wim -- WLavrijsen@lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net
Shaheed,
Ah, I had not realised rootcling existed. I've seen that I can invoke it using Python version-specific paths...is this the correct way to invoke it:
ROOTCLING=/usr/local/lib/python3.6/dist-packages/cppyy_backend LD_LIBRARY_PATH=$ROOTCLING/lib $ROOTCLING/bin/rootcling -h
Yes, and here's a description of the LinkDef.h format: https://root.cern.ch/root/html/guides/users-guide/AddingaClass.html#the-link...
or is there a recommended wrapper?
No, but I'm going to add one for pip, same as I did for genreflex. I've been fleshing out the backend generation, taken over from Anto: https://bitbucket.org/wlav/cppyy-backend where all that can live. I'm told that I'll need rootcling anyway for use of modules (see below).
I actually get some warnings and then the error:
Add this set of exclusions to the selection.xml: <exclusion> <class pattern="*thread_mutex*" /> <class pattern="*new_allocator*" /> <class pattern="*Alloc_hider*" /> </exclusion> Of course, the larger problem of pulling in these standard libs over and over again is that it is a waste of cpu and memory, so I do want to see the file_name attribute fixed. As it stands, I'd simply exclude: <class pattern="std::*" /> <class pattern="__gnu_cxx::*" /> especially since they are already available by default. Note that those two rules cover the ones needed for new_allocator and Alloc_hider. However, there is a more efficient approach that is right around the corner (and has been right about the corner for a long time, so don't hold me to that). Next release now seems likely though. The long term goal has always been to use modules: http://clang.llvm.org/docs/Modules.html but the original drivers (Apple, Google, and the C++ standards committee) have been going back and forth on it. Now, things are finally falling into place. Here's Google: https://www.youtube.com/watch?v=dHFNpBfemDI And here's ROOT: https://indico.cern.ch/event/643728/contributions/2612822/attachments/149407... The big deal is that C++ developers have an incentive to deploy modules, so being able to patch into that should be a huge time saver (and where they don't, rootcling will soon be able to create modules from headers). Note that modules don't come for free: it will require some ambiguity resolution, but that is typically a Good Thing (code-quality wise). Modules allow deserialization of only the piece of the AST that is actually being requested, saving memory. This as opposed to header files (whether or not precompiled) which pull in everything before them. See the status report above for the improvements in memory usage. And with modules, of course, selection becomes unnecessary (markup for automatic streamers may still be useful, but that is not relevant for bindings generation).
I did wonder if I was missing some "-isystem" includes, and tried adding them but the --debug output from genreflex seemed to suggest they were being ignored.
Some flags are ignored as no-one was using them (so far). Some others are definitely obsolete by now.
What is interesting, and might possibly throw light on the selection filter issue, is that the file name for the classes in kjsinterpreter.h itself is always the empty string ''. Classes that come from included files return non-empty strings such as 'kjsobject.h' for 'KJSObject'.
That's after the fact (i.e. what is stored); I don't see the rule being respected/used at all.
BTW, the reason for doing this is that lots of KDE code has multiple classes and even namespaces in a single header file. Now, for discoverability of the loaded objects, I find the incremental "pop into cppyy,gbl on demand" somewhat limiting and I wanted to play about with that. I could also workaround the filter issue if I precomputed the needed names in a precursor pass.
The issue here is the memory cost of loading things that won't get used in the end. This is why a functional dir() (which needs nothing but strings, after all), in conjunction with lazy loading/creation when a real access happens work well. LLVM is fully lookup based, btw. There is a custom layer on top of Cling to make enumeration possible.
Finally, and most importantly given the fidelity with which cppyy renders the C++ code, I'm think about how Pythonisation customisation might be handled: e.g. a Python wrapper layer to allow a pointer-plus-size to render as a Python list/tuple, or generate a dict mapping fora QSet, and so on. (I'm dimly aware of the boost-recognition logic you have alluded to, this is specifically more about Qt-specific patterns and ad-hoc scenarios).
In 2015, a GSoC student fleshed this out. I never put it into PyPy b/c of a lack of test coverage, but did put in in PyROOT. Here's an example of the "pointer-plus-size" pythonization (from ROOT.py): # python side pythonizations (should live in their own file, if we get many) def set_size(self, buf): buf.SetSize(self.GetN()) return buf # TODO: add pythonization API to pypy-c if not PYPY_CPPYY_COMPATIBILITY_FIXME: cppyy.add_pythonization( cppyy.compose_method("^TGraph(2D)?$|^TGraph.*Errors$", "GetE?[XYZ]$", set_size)) The functions selected by the regexps return naked pointers, but the object can be queried for the size (all have a consistent GetN() function). So the method composer patches up the return value, making it a sized array, instead of an "open-ended" one. I'm sitting on some patches as I wanted to tweak his APIs a bit. There was some ordering that I felt didn't compose well, but that is minor. Similarly, there's code to apply ownership rules, mapping exceptions, the new C++11 smartptrs, controlling auto-casting, handling the GIL, making properties, and adding overloads. All driven by regexp matching of patterns. See here: https://bitbucket.org/wlav/cppyy/src/4d14ba325e494f13cc11f3f11cbb87b44048b25... (plus further support inside the bindings layer itself). Of course, one can hook up completely custom functions, and he made it so that that is per C++ namespace, so nicely self-contained. Again, this is currently only partly available, as I need to write a lot more tests for PyPy (which are bound to unearth some problems along the way). And then there is documentation to be written ...
P.S. Please note that after today, I'll likely not have much Internet access for a couple of weeks, so any responses may be limited.
I'll make sure I have at least all my local changes pushed by then. :) Best regards, Wim -- WLavrijsen@lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net
Wim, Thanks for the detailed and thoughtful reply. I will digest and respond when I am properly back in circulation. On 15 September 2017 at 07:43, <wlavrijsen@lbl.gov> wrote:
Shaheed,
Ah, I had not realised rootcling existed. I've seen that I can invoke it using Python version-specific paths...is this the correct way to invoke it:
ROOTCLING=/usr/local/lib/python3.6/dist-packages/cppyy_backend LD_LIBRARY_PATH=$ROOTCLING/lib $ROOTCLING/bin/rootcling -h
Yes, and here's a description of the LinkDef.h format:
https://root.cern.ch/root/html/guides/users-guide/AddingaClass.html#the-link...
or is there a recommended wrapper?
No, but I'm going to add one for pip, same as I did for genreflex. I've been fleshing out the backend generation, taken over from Anto:
https://bitbucket.org/wlav/cppyy-backend
where all that can live. I'm told that I'll need rootcling anyway for use of modules (see below).
I actually get some warnings and then the error:
Add this set of exclusions to the selection.xml:
<exclusion> <class pattern="*thread_mutex*" /> <class pattern="*new_allocator*" /> <class pattern="*Alloc_hider*" /> </exclusion>
Of course, the larger problem of pulling in these standard libs over and over again is that it is a waste of cpu and memory, so I do want to see the file_name attribute fixed. As it stands, I'd simply exclude:
<class pattern="std::*" /> <class pattern="__gnu_cxx::*" />
especially since they are already available by default. Note that those two rules cover the ones needed for new_allocator and Alloc_hider.
However, there is a more efficient approach that is right around the corner (and has been right about the corner for a long time, so don't hold me to that). Next release now seems likely though.
The long term goal has always been to use modules:
http://clang.llvm.org/docs/Modules.html
but the original drivers (Apple, Google, and the C++ standards committee) have been going back and forth on it. Now, things are finally falling into place. Here's Google:
https://www.youtube.com/watch?v=dHFNpBfemDI
And here's ROOT:
https://indico.cern.ch/event/643728/contributions/2612822/attachments/149407...
The big deal is that C++ developers have an incentive to deploy modules, so being able to patch into that should be a huge time saver (and where they don't, rootcling will soon be able to create modules from headers). Note that modules don't come for free: it will require some ambiguity resolution, but that is typically a Good Thing (code-quality wise).
Modules allow deserialization of only the piece of the AST that is actually being requested, saving memory. This as opposed to header files (whether or not precompiled) which pull in everything before them. See the status report above for the improvements in memory usage.
And with modules, of course, selection becomes unnecessary (markup for automatic streamers may still be useful, but that is not relevant for bindings generation).
I did wonder if I was missing some "-isystem" includes, and tried adding them but the --debug output from genreflex seemed to suggest they were being ignored.
Some flags are ignored as no-one was using them (so far). Some others are definitely obsolete by now.
What is interesting, and might possibly throw light on the selection filter issue, is that the file name for the classes in kjsinterpreter.h itself is always the empty string ''. Classes that come from included files return non-empty strings such as 'kjsobject.h' for 'KJSObject'.
That's after the fact (i.e. what is stored); I don't see the rule being respected/used at all.
BTW, the reason for doing this is that lots of KDE code has multiple classes and even namespaces in a single header file. Now, for discoverability of the loaded objects, I find the incremental "pop into cppyy,gbl on demand" somewhat limiting and I wanted to play about with that. I could also workaround the filter issue if I precomputed the needed names in a precursor pass.
The issue here is the memory cost of loading things that won't get used in the end. This is why a functional dir() (which needs nothing but strings, after all), in conjunction with lazy loading/creation when a real access happens work well. LLVM is fully lookup based, btw. There is a custom layer on top of Cling to make enumeration possible.
Finally, and most importantly given the fidelity with which cppyy renders the C++ code, I'm think about how Pythonisation customisation might be handled: e.g. a Python wrapper layer to allow a pointer-plus-size to render as a Python list/tuple, or generate a dict mapping fora QSet, and so on. (I'm dimly aware of the boost-recognition logic you have alluded to, this is specifically more about Qt-specific patterns and ad-hoc scenarios).
In 2015, a GSoC student fleshed this out. I never put it into PyPy b/c of a lack of test coverage, but did put in in PyROOT. Here's an example of the "pointer-plus-size" pythonization (from ROOT.py):
# python side pythonizations (should live in their own file, if we get many) def set_size(self, buf): buf.SetSize(self.GetN()) return buf
# TODO: add pythonization API to pypy-c if not PYPY_CPPYY_COMPATIBILITY_FIXME: cppyy.add_pythonization( cppyy.compose_method("^TGraph(2D)?$|^TGraph.*Errors$", "GetE?[XYZ]$", set_size))
The functions selected by the regexps return naked pointers, but the object can be queried for the size (all have a consistent GetN() function). So the method composer patches up the return value, making it a sized array, instead of an "open-ended" one.
I'm sitting on some patches as I wanted to tweak his APIs a bit. There was some ordering that I felt didn't compose well, but that is minor.
Similarly, there's code to apply ownership rules, mapping exceptions, the new C++11 smartptrs, controlling auto-casting, handling the GIL, making properties, and adding overloads. All driven by regexp matching of patterns. See here:
https://bitbucket.org/wlav/cppyy/src/4d14ba325e494f13cc11f3f11cbb87b44048b25...
(plus further support inside the bindings layer itself).
Of course, one can hook up completely custom functions, and he made it so that that is per C++ namespace, so nicely self-contained.
Again, this is currently only partly available, as I need to write a lot more tests for PyPy (which are bound to unearth some problems along the way). And then there is documentation to be written ...
P.S. Please note that after today, I'll likely not have much Internet access for a couple of weeks, so any responses may be limited.
I'll make sure I have at least all my local changes pushed by then. :)
Best regards, Wim -- WLavrijsen@lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net
Hi Wim, After reviewing your comments, I propose to check out rootcling. I initially had some trouble using pip3 to install the newer code, but that seems to have been resolved as of yesterday's 0.2.3 build. I did notice one message during the install which seems to be benign, so I mention it here merely in passing: Running command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-spz01kkp/cppyy-backend/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpe2h6yls0pip-wheel- --python-tag cp36 running bdist_wheel running build running build_ext error: [Errno 2] No such file or directory: 'cling-config': 'cling-config' error Failed building wheel for cppyy-backend Running setup.py clean for cppyy-backend I'll no doubt be back with questions :-). Thanks for all the good work, Shaheed On 23 September 2017 at 06:24, Shaheed Haque <srhaque@theiet.org> wrote:
Wim,
Thanks for the detailed and thoughtful reply. I will digest and respond when I am properly back in circulation.
On 15 September 2017 at 07:43, <wlavrijsen@lbl.gov> wrote:
Shaheed,
Ah, I had not realised rootcling existed. I've seen that I can invoke it using Python version-specific paths...is this the correct way to invoke it:
ROOTCLING=/usr/local/lib/python3.6/dist-packages/cppyy_backend LD_LIBRARY_PATH=$ROOTCLING/lib $ROOTCLING/bin/rootcling -h
Yes, and here's a description of the LinkDef.h format:
https://root.cern.ch/root/html/guides/users-guide/AddingaClass.html#the-link...
or is there a recommended wrapper?
No, but I'm going to add one for pip, same as I did for genreflex. I've been fleshing out the backend generation, taken over from Anto:
https://bitbucket.org/wlav/cppyy-backend
where all that can live. I'm told that I'll need rootcling anyway for use of modules (see below).
I actually get some warnings and then the error:
Add this set of exclusions to the selection.xml:
<exclusion> <class pattern="*thread_mutex*" /> <class pattern="*new_allocator*" /> <class pattern="*Alloc_hider*" /> </exclusion>
Of course, the larger problem of pulling in these standard libs over and over again is that it is a waste of cpu and memory, so I do want to see the file_name attribute fixed. As it stands, I'd simply exclude:
<class pattern="std::*" /> <class pattern="__gnu_cxx::*" />
especially since they are already available by default. Note that those two rules cover the ones needed for new_allocator and Alloc_hider.
However, there is a more efficient approach that is right around the corner (and has been right about the corner for a long time, so don't hold me to that). Next release now seems likely though.
The long term goal has always been to use modules:
http://clang.llvm.org/docs/Modules.html
but the original drivers (Apple, Google, and the C++ standards committee) have been going back and forth on it. Now, things are finally falling into place. Here's Google:
https://www.youtube.com/watch?v=dHFNpBfemDI
And here's ROOT:
https://indico.cern.ch/event/643728/contributions/2612822/attachments/149407...
The big deal is that C++ developers have an incentive to deploy modules, so being able to patch into that should be a huge time saver (and where they don't, rootcling will soon be able to create modules from headers). Note that modules don't come for free: it will require some ambiguity resolution, but that is typically a Good Thing (code-quality wise).
Modules allow deserialization of only the piece of the AST that is actually being requested, saving memory. This as opposed to header files (whether or not precompiled) which pull in everything before them. See the status report above for the improvements in memory usage.
And with modules, of course, selection becomes unnecessary (markup for automatic streamers may still be useful, but that is not relevant for bindings generation).
I did wonder if I was missing some "-isystem" includes, and tried adding them but the --debug output from genreflex seemed to suggest they were being ignored.
Some flags are ignored as no-one was using them (so far). Some others are definitely obsolete by now.
What is interesting, and might possibly throw light on the selection filter issue, is that the file name for the classes in kjsinterpreter.h itself is always the empty string ''. Classes that come from included files return non-empty strings such as 'kjsobject.h' for 'KJSObject'.
That's after the fact (i.e. what is stored); I don't see the rule being respected/used at all.
BTW, the reason for doing this is that lots of KDE code has multiple classes and even namespaces in a single header file. Now, for discoverability of the loaded objects, I find the incremental "pop into cppyy,gbl on demand" somewhat limiting and I wanted to play about with that. I could also workaround the filter issue if I precomputed the needed names in a precursor pass.
The issue here is the memory cost of loading things that won't get used in the end. This is why a functional dir() (which needs nothing but strings, after all), in conjunction with lazy loading/creation when a real access happens work well. LLVM is fully lookup based, btw. There is a custom layer on top of Cling to make enumeration possible.
Finally, and most importantly given the fidelity with which cppyy renders the C++ code, I'm think about how Pythonisation customisation might be handled: e.g. a Python wrapper layer to allow a pointer-plus-size to render as a Python list/tuple, or generate a dict mapping fora QSet, and so on. (I'm dimly aware of the boost-recognition logic you have alluded to, this is specifically more about Qt-specific patterns and ad-hoc scenarios).
In 2015, a GSoC student fleshed this out. I never put it into PyPy b/c of a lack of test coverage, but did put in in PyROOT. Here's an example of the "pointer-plus-size" pythonization (from ROOT.py):
# python side pythonizations (should live in their own file, if we get many) def set_size(self, buf): buf.SetSize(self.GetN()) return buf
# TODO: add pythonization API to pypy-c if not PYPY_CPPYY_COMPATIBILITY_FIXME: cppyy.add_pythonization( cppyy.compose_method("^TGraph(2D)?$|^TGraph.*Errors$", "GetE?[XYZ]$", set_size))
The functions selected by the regexps return naked pointers, but the object can be queried for the size (all have a consistent GetN() function). So the method composer patches up the return value, making it a sized array, instead of an "open-ended" one.
I'm sitting on some patches as I wanted to tweak his APIs a bit. There was some ordering that I felt didn't compose well, but that is minor.
Similarly, there's code to apply ownership rules, mapping exceptions, the new C++11 smartptrs, controlling auto-casting, handling the GIL, making properties, and adding overloads. All driven by regexp matching of patterns. See here:
https://bitbucket.org/wlav/cppyy/src/4d14ba325e494f13cc11f3f11cbb87b44048b25...
(plus further support inside the bindings layer itself).
Of course, one can hook up completely custom functions, and he made it so that that is per C++ namespace, so nicely self-contained.
Again, this is currently only partly available, as I need to write a lot more tests for PyPy (which are bound to unearth some problems along the way). And then there is documentation to be written ...
P.S. Please note that after today, I'll likely not have much Internet access for a couple of weeks, so any responses may be limited.
I'll make sure I have at least all my local changes pushed by then. :)
Best regards, Wim -- WLavrijsen@lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net
Oh wait, I think I see that cling-config is installed by the cppyy package. (Seems a tad confusing, ho-hum). On 11 October 2017 at 10:29, Shaheed Haque <srhaque@theiet.org> wrote:
Hi Wim,
After reviewing your comments, I propose to check out rootcling. I initially had some trouble using pip3 to install the newer code, but that seems to have been resolved as of yesterday's 0.2.3 build. I did notice one message during the install which seems to be benign, so I mention it here merely in passing:
Running command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-spz01kkp/cppyy-backend/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpe2h6yls0pip-wheel- --python-tag cp36 running bdist_wheel running build running build_ext error: [Errno 2] No such file or directory: 'cling-config': 'cling-config' error Failed building wheel for cppyy-backend Running setup.py clean for cppyy-backend
I'll no doubt be back with questions :-).
Thanks for all the good work, Shaheed
On 23 September 2017 at 06:24, Shaheed Haque <srhaque@theiet.org> wrote:
Wim,
Thanks for the detailed and thoughtful reply. I will digest and respond when I am properly back in circulation.
On 15 September 2017 at 07:43, <wlavrijsen@lbl.gov> wrote:
Shaheed,
Ah, I had not realised rootcling existed. I've seen that I can invoke it using Python version-specific paths...is this the correct way to invoke it:
ROOTCLING=/usr/local/lib/python3.6/dist-packages/cppyy_backend LD_LIBRARY_PATH=$ROOTCLING/lib $ROOTCLING/bin/rootcling -h
Yes, and here's a description of the LinkDef.h format:
https://root.cern.ch/root/html/guides/users-guide/AddingaClass.html#the-link...
or is there a recommended wrapper?
No, but I'm going to add one for pip, same as I did for genreflex. I've been fleshing out the backend generation, taken over from Anto:
https://bitbucket.org/wlav/cppyy-backend
where all that can live. I'm told that I'll need rootcling anyway for use of modules (see below).
I actually get some warnings and then the error:
Add this set of exclusions to the selection.xml:
<exclusion> <class pattern="*thread_mutex*" /> <class pattern="*new_allocator*" /> <class pattern="*Alloc_hider*" /> </exclusion>
Of course, the larger problem of pulling in these standard libs over and over again is that it is a waste of cpu and memory, so I do want to see the file_name attribute fixed. As it stands, I'd simply exclude:
<class pattern="std::*" /> <class pattern="__gnu_cxx::*" />
especially since they are already available by default. Note that those two rules cover the ones needed for new_allocator and Alloc_hider.
However, there is a more efficient approach that is right around the corner (and has been right about the corner for a long time, so don't hold me to that). Next release now seems likely though.
The long term goal has always been to use modules:
http://clang.llvm.org/docs/Modules.html
but the original drivers (Apple, Google, and the C++ standards committee) have been going back and forth on it. Now, things are finally falling into place. Here's Google:
https://www.youtube.com/watch?v=dHFNpBfemDI
And here's ROOT:
https://indico.cern.ch/event/643728/contributions/2612822/attachments/149407...
The big deal is that C++ developers have an incentive to deploy modules, so being able to patch into that should be a huge time saver (and where they don't, rootcling will soon be able to create modules from headers). Note that modules don't come for free: it will require some ambiguity resolution, but that is typically a Good Thing (code-quality wise).
Modules allow deserialization of only the piece of the AST that is actually being requested, saving memory. This as opposed to header files (whether or not precompiled) which pull in everything before them. See the status report above for the improvements in memory usage.
And with modules, of course, selection becomes unnecessary (markup for automatic streamers may still be useful, but that is not relevant for bindings generation).
I did wonder if I was missing some "-isystem" includes, and tried adding them but the --debug output from genreflex seemed to suggest they were being ignored.
Some flags are ignored as no-one was using them (so far). Some others are definitely obsolete by now.
What is interesting, and might possibly throw light on the selection filter issue, is that the file name for the classes in kjsinterpreter.h itself is always the empty string ''. Classes that come from included files return non-empty strings such as 'kjsobject.h' for 'KJSObject'.
That's after the fact (i.e. what is stored); I don't see the rule being respected/used at all.
BTW, the reason for doing this is that lots of KDE code has multiple classes and even namespaces in a single header file. Now, for discoverability of the loaded objects, I find the incremental "pop into cppyy,gbl on demand" somewhat limiting and I wanted to play about with that. I could also workaround the filter issue if I precomputed the needed names in a precursor pass.
The issue here is the memory cost of loading things that won't get used in the end. This is why a functional dir() (which needs nothing but strings, after all), in conjunction with lazy loading/creation when a real access happens work well. LLVM is fully lookup based, btw. There is a custom layer on top of Cling to make enumeration possible.
Finally, and most importantly given the fidelity with which cppyy renders the C++ code, I'm think about how Pythonisation customisation might be handled: e.g. a Python wrapper layer to allow a pointer-plus-size to render as a Python list/tuple, or generate a dict mapping fora QSet, and so on. (I'm dimly aware of the boost-recognition logic you have alluded to, this is specifically more about Qt-specific patterns and ad-hoc scenarios).
In 2015, a GSoC student fleshed this out. I never put it into PyPy b/c of a lack of test coverage, but did put in in PyROOT. Here's an example of the "pointer-plus-size" pythonization (from ROOT.py):
# python side pythonizations (should live in their own file, if we get many) def set_size(self, buf): buf.SetSize(self.GetN()) return buf
# TODO: add pythonization API to pypy-c if not PYPY_CPPYY_COMPATIBILITY_FIXME: cppyy.add_pythonization( cppyy.compose_method("^TGraph(2D)?$|^TGraph.*Errors$", "GetE?[XYZ]$", set_size))
The functions selected by the regexps return naked pointers, but the object can be queried for the size (all have a consistent GetN() function). So the method composer patches up the return value, making it a sized array, instead of an "open-ended" one.
I'm sitting on some patches as I wanted to tweak his APIs a bit. There was some ordering that I felt didn't compose well, but that is minor.
Similarly, there's code to apply ownership rules, mapping exceptions, the new C++11 smartptrs, controlling auto-casting, handling the GIL, making properties, and adding overloads. All driven by regexp matching of patterns. See here:
https://bitbucket.org/wlav/cppyy/src/4d14ba325e494f13cc11f3f11cbb87b44048b25...
(plus further support inside the bindings layer itself).
Of course, one can hook up completely custom functions, and he made it so that that is per C++ namespace, so nicely self-contained.
Again, this is currently only partly available, as I need to write a lot more tests for PyPy (which are bound to unearth some problems along the way). And then there is documentation to be written ...
P.S. Please note that after today, I'll likely not have much Internet access for a couple of weeks, so any responses may be limited.
I'll make sure I have at least all my local changes pushed by then. :)
Best regards, Wim -- WLavrijsen@lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net
Shaheed,
Oh wait, I think I see that cling-config is installed by the cppyy package. (Seems a tad confusing, ho-hum).
no, it's in cppyy-cling, which was freshly pulled in when starting from cppyy, as all has been updated to take that new split into account. (I'm not sure how to force such updates otherwise.) As for the reasons for splitting and the overall package structure, rather than posting it here, I added it to the docs: http://cppyy.readthedocs.io/en/latest/installation.html#package-structure Basically, I want to avoid having to republish/reinstall all of Cling/LLVM whenever I make a small change in the wrapper, as the former changes only very infrequently (and takes a long time to build, as opposed to the wrapper which is just a single C++ file). I hope this is the last change I need to make to the package structure. :) Once 1.0 is out, I'll look into whether something like conda is better than pip (given the amount of C++ code). For now I think pip will do. Best regards, Wim -- WLavrijsen@lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net
participants (2)
-
Shaheed Haque
-
wlavrijsen@lbl.gov