[Cython] RFC: an inline_ function that dumps c/c++ code to the code emitter

Jason Newton nevion at gmail.com
Sun Aug 21 10:26:24 EDT 2016


On Sun, Aug 21, 2016 at 5:30 AM, Robert Bradshaw <robertwb at gmail.com> wrote:

>
> In my experience Cython has generally been fairly easy to pick up for
> people who already know Python. And Python often easy to pick up for
> people who already know C/C++. Of course for many wrappings it often
> takes non-trivial knowledge of the wrapped library itself too, but
> typically at the same level as would be required to grok code written
> against that same library directly from C/C++.
>

Is your experience drawn from binding moderately complex libraries or play
code (of complexity like from a tutorial).  Bottom up or top down?  Sorry
if this sounds asinine to you.  To clarify, I'm coming from the case where
I didn't read the whole tutorial/docs before being faced with pyx in the
projects I previously mentioned, while on tight turn around time - I was
not able to grok in that context.

For myself and any other ML user who comes across this thread - can you
list a few libraries that do things the right way?


>
> Yes, there's bad code out there in any language (no offense meant
> towards h5py--I haven't looked at that project myself). Much of it due
> to cargo-cult perpetuations or archaic (or simply flat-out-wrong)
> contortions due to historical limitations (e.g. creating a Python
> module to wrap an _extension module, avoiding all C++ features with
> extensive C wrappers, ...). (You're familiar with C++, so likely no
> stranger to this effect.)
>
> > These projects complied with Cython's current philosophy
> > to the degradation of clarity, context, and overall idea of how code was
> > hooked up.  Perhaps Cython should take the lessons learned from it's
> > inception, time, and the results of the state of the c-python userbase to
> > guide us into a new philosophy.
>
> I fail to see how "staying close to Python" caused "degradation of
> clarity, context, etc." If anything, the lessons learned over time
> have validated this philosophy. More on this later.
>

My point was that multifile multi-level wrapper that I mentioned earlier -
if you're saying that those projects did Cython extensions wrong, then I'm
incorrect at faulting Cython and should fault the libraries using it.  I
didn't say staying close to python caused $blurb.

I don't know in a situation as confusing as to all the binding projects if
this should be taken as validation of philosophy either - I think it is
reasonable to consider the attrition of these projects as a function of
manpower, number of early on project supporters/authors, and if a project
(like sage) indirectly, through dependency,  kept the project alive.  And
good old fashioned luck.  I noted most of them don't use distutils and
something custom but less capable instead which maybe plays a roll in how
mature/usable/smalltime they where/are.


> I agree that any efforts to trying to parsing C++ without building on
> an actual compiler are fraught with danger. That's not the case with
> generating C++ code, which is the direction we're going. In
> particular, our goal is to understand C++ enough to invoke it, which
> allows us to be much less pedantic.
>

I understand and agree with the logic in stating it's a less complicated
goal but what comparable success stories exist?  I strongly think "devils
in the details" in correctly making that work and that they will be tough
solvable problems.  And then you're going and promising on unfamiliar
territory.  But what the ultimate takeaway for me is that you won't have it
ready in any near term.  Do you have the skills and resources to implement
this in under 2 years?  And then the other question is are you and the team
reasonably confident you will have it working and usable by then.
Otherwise you are not being pragmatic.

On the other hand, if it was reasonably simple as many of your other points
in future emails point out, I'd really like to know why you hadn't
addressed them earlier.


>
> A that automatically extract definitions (or even wrappings, even
> partially) from C++ headers is another topic, and should almost
> certainly lean on an existing compiler.
>
> >> > -It would allow single source files - I think this is important for
> >> > runtime
> >> > compiled quasi-JIT/AOC fragments, like OpenCL/PyOpenCL/PyCUDA provide
> >>
> >> Not quite following what you're saying here.
> >
> > Maybe PyInline was a better example off the bat for you, but something a
> > little more flexible but also with less work is needed.  Compare with
> > PyOpenCL:
> > https://documen.tician.de/pyopencl/ - check out some examples.   There
> is a
> > c runtime api between the contexts hooking things up (this is the OpenCL
> > runtime part) - it's a pretty similar story to PyCuda (and by the same
> > author, execpt for that project has to jump out to nvcc and cache kernel
> > compilation like the inline function implementation does).  There's no
> limit
> > to the number of functions you can declare though and the OpenCL side is
> > kept simple - things are generally pretty typesafe/do what you would
> expect
> > on dispatch.
>
> The fact that you're essentially defining a kernel/ufunc with a well
> defined API in another language makes this a somewhat more natural and
> tractable case than freely sliding back and forth between C and Python
> data structures, function calls, etc. multiple times in a function's
> body. (Not to say that these aren't quite sophisticated projects--they
> tackle more interesting difficulties elsewhere.)
>

After you get used to the mental gymnastics you do there, in that kind of
development (CUDA/OpenCL), this is pretty simple by comparison. It does
feels like some of the same areas of my brain would be activated though in
this case. :-)  Minor quibble - no you're not defining a ufunc... even a
kernel is misnomer, it's a full program on the other end with no main.


>
> >> > The idea is that Cython glue makes the playing field for extracting
> data
> >> > easy, but that once it's extracted to a cdef variable for instance,
> >> > cython
> >> > doesn't need to know what happens.  Maybe in a way sort of like the
> GCC
> >> > asm
> >> > extension.  Hopefully simpler variable passing though.
> >>
> >> Cython uses "mangled" names (e.g. with a __pyx prefix) to avoid any
> >> possible conflicts. Specifying what/how to mangle could get as ugly as
> >> GCC's asm variable passing. And embedded variable declarations, let
> >> alone control flow statements (especially return, break, ...) could
> >> get really messy. It obscures analysis Cython can do on the code, such
> >> as whether variables are used or what values they may take. Little
> >> code snippets are not always local either, e.g. do they often need to
> >> refer to variables (or headers) referenced elsewhere. And they must
> >> all be mutually compatible.
> >
> > Like gcc's asm, let's let adults do what they want and let them worry
> about
> > the consequences of flow control/stray includes. I'm not even sure how
> most
> > of this would be an issue (switch/break/if) if you are properly nesting
> pyxd
> > output.  The only thing I think is an issue here is mangled names.  I
> > haven't yet figured out why (cdef) variable names must be mangled.  Can
> you
> > explain?  Maybe we add an option to allow it to be unmangled in their
> > declaration? C++ has extern "C" for example.
>
> Name mangling is done for the standard reasons--to avoid possible
> conflicts with all other symbols that may be defined. E.g. We don't
> want things to suddenly break if I happen to create a variable called
> "PyNone." Or "__pyx_something_we_defined_implicitly." And of course we
> want to mangle globals, function names, etc. lest they conflict with
> some otherwise irrelevant symbol defined in some (possibly
> recursively) included header somewhere.
>
> Again, you could just say "Don't name things like that." This exposes
> some more guiding principles. (1) If it's valid Python, it should be
> valid Cython and (2) we always try to produce valid C code--if you
> haven't lied to us (too much) about your external declarations, a
> successful Cython compilation results in a valid C/C++ output. Also
> (3) you shouldn't have to read or understand the generated C and the
> Python/C API to use, let alone debug, Cython (though you're happy to
> do so if you want, like Java developers sometimes read bytecodes, but
> not usually, though understanding implementation can sometimes be
> helpful when chasing performance (for all languages)).
>
> There's an obvious tension between giving users all the rope they want
> vs. providing an API that is possibly more restrictive, but inherently
> correct by construction. I'll concede that Cython necessarily has
> pointers, so I'll give that there's plenty of room for foot-shooting
> (and better interfacing with modern C++ would be good help there), but
> the kind of errors one runs into by injecting arbitrary code snippets
> take things to a whole new level (and specifically violate (3) when
> developing and debugging).
>

I think injecting arbitrary code snippets has a reasonably good probably of
not breaking 3 in your above, provided we have a way to get at unmangled
identifiers (*or* document and stick to the mangling strategy, assuming
it's easy) - that or we scan the snippet code and replace identifiers
(significantly more complex, instincts make me think fragile for a while
until it's gotten right - esp without LLVM).  Perhaps syntax errors would
be an issue if you're just coding things up... but again there's the opt-in
to this construct and we could make life easier by annotating the snippet
in the output - to help localize the user.


>
> The escape hatch is to wrap the C++ in an actual C++ file and invoke
> the wrapping. Typically this is the minority of one's code--if it's
> the "whole library" then you probably have an API that's only
> understandable to someone well versed in C++ anyways. You've given a
> single example (non-type template arguments) that we would like to
> support that's blocking you.
>

My lack of examples is due to insufficient time playing with Cython - I hit
nonstarters so I stop and abandon; as I said, to date, Cython has never
been able to solve my C++ problems and none of them seem extraordinary.  I
think you've got alot of more unknown-unknowns here than you give credit to
but we can't discover that until you at least fix that template bug
(properly).  I'm still not looking forward to forward declaring every
identifier/function/whatever from C++ land in Cython though and I still
strongly dislike that there's no single source way of doing Cython with
something like a kernel/ufunc that needs to escape to C/C++.  This makes
doing something like the mako based templates I mentioned in the OP email
much more cumbersome/hard and Cython would provide no built in mechanism
(like inline/inline_module) for making that work.


>
> > Why is allowing arbitrary code inside not a good idea?  We're not talking
> > something necessarily like eval here and the reputation it got,
>
> Actually, I think it's a whole lot like eval. It's taking an opaque
> (string) chunk of data and executing it as code. But potentially worse
> as it's in a different language and evaluated in a transformed (even
> if the names were unmangled) context.
>
> If we were to go this direction, I might go with a function call (like
> weave, maybe even follow it) rather than a new statement as the latter
> is difficult to extend with the myriad of optional configuration
> parameters, etc. that would beg to follow.
>

My point with eval is it's bad reputation was mostly due to security
vulnerabilities and it joined the league of evil like goto and other tools
that are great in the right hands and times.  The string is known and fixed
at pyx body cython compile time which is one of the things that made me
think it had to be a statement rather than a function.  But again, I'm not
a cython developer.  If you think an inlining function that takes in all
the args it needs to inline successfully is the mechanism that achieves the
effect on pyx compile, I don't think I'd mind.   It sure is a privileged
and weird function though, to be able to emit code and not be a runtime
statement.

2 Examples I wanted to pull out real quick are:

https://github.com/scipy/weave/blob/master/examples/wx_example.py
https://github.com/scipy/weave/blob/master/examples/binary_search.py

But I think I detect magic going on here for adding includes in the wx
example and that is not usable/reliable approach.  The cython.inline
implementation also did include magic for numpy variables... Just a
warning.  I don't think weave failed because of Python 3 support, I think
it was because it was too limited to be useful because of that magic and
the walls around getting something say like Eigen in, so nobody used it.


> > You must realize that almost any other python driven way to compile
> c-code
> > in the spirit these projects do is deprecated/dead.  Cython has absorbed
> all
> > the reputation and users that didn't go to pure-c/boost.python -
> pybind11 is
> > the new kid on the block there so I'm not including it (I'm of the
> opinion
> > that SWIG users stayed unchanged).  Community belief/QA/designers/google
> all
> > think of Cython first.  Weave has effectively closed up it's doors and
> I'm
> > not even sure it had the power to do what I wanted anyway because Cython
> > provides a language that eases the data-extraction/typecasting part of
> > inlining C/C++.
>
> You seem to be repeatedly bringing up the points[:]
>
> * Many (most?) of these string-based approaches are essentially dead,
> often pointing people to Cython instead, but
> * Cython should adopt the string-embedding approach of these earlier
> projects.
>

Hoho - zing!  No that is not a conclusion you should be drawing.   Your
faults here are to imply those projects failed because they used
string-embedding approaches and to imply string-embedding based approaches
are the approaches that failed - *most* have failed over a variety of
implementations both Python driven and not.  I restricted to python-driven
for the sake of brevity and mentioned the selection. As I tried to hint
earlier, the several other projects failure happened because of any number
of unrelated reasons to it being a string based approach.  I believe
additionally that there were too many options (confusion) and too small a
potential userbase (at least in those years) to bolster and attract blood
to each of the projects and make them thrive.  Probably something akin to
the ton of orphaned projects on pipi, it doesn't mean it happened because
the approach was wrong.  One thing I did want you to take away is that
Cython needs to absorb the responsibility of it's reputation and status -
the last survivor of a somewhat diverse class with different capabilities,
if you will, that went outside of your original usage.


> You ask at the beginning of the email whether time has vindicated our
> philosophy. I think, based on the mindshare vs. these other attempts
> at integrating with C, in large part it has. It has served us and our
> users well; we will strive to stay close to Python.
>
> Tight interleaving of multiple languages in is cute for making a
> polyglot script, but I do not think it leads to legible code. An
> "eval_cpp" operator would be a lot like the builtin eval--it'd be
> really tempting to do the "quick and easy" hack of dropping in some
> executable string instead of thinking how to structure things such
> that that could be avoided, but putting in this effort leads to more
> comprehensible code.
>

It's served your *C* and faster-python users well.  If you had proper
constructs, I'm sure people wouldn't choose to do it with inline_c unless
there was a compelling solid reasoning.

I'm not saying you can't make those constructs and that users wouldn't use
them when they appear, but you are not being pragmatic again.  You
currently don't have all capabilities and are on risky turf for supporting
all c++ standards for the rest of time.  I'm betting against that you will
produce in a useful timeframe (maybe this a 5 year scale?) the usable
constructs needed - which I equate as turning your head away and giving the
middle finger to C++ developers.  inline_c would allow a forwards
compatible way to use anything the target c++ compiler allows with some
very minimal guarantees on Cython's side.  It is a very elegant and capable
solution to a hard problem.


>
> It's hard to say "no" to features, but I think such an introduction
> would fundamentally change Cython and how it's written for the worse.
>

I agree with the statement but I don't think you've classified the feature
correctly.

I watched one of your old talks for Sage Days 29 and at the bottom of a
slide you have "Cython is a very pragmatic project, driven by user needs".
I'm calling foul.  Go watch that video again and tell me what's changed
since 2011.

-Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20160821/c7dbb197/attachment-0001.html>


More information about the cython-devel mailing list