[Python-ideas] discouraging direct use of the C-API

Stefan Behnel stefan_ml at behnel.de
Thu May 7 19:23:41 CEST 2015


Paul Moore schrieb am 07.05.2015 um 10:47:
> On 7 May 2015 at 08:56, M.-A. Lemburg wrote:
>> Aside: The fact that we have so many nice C extensions out
>> there is proof that we have a good C API. Even though it is
>> not visible to most Python programmers, it forms a significant
>> part of Python's success.

Oh, totally. But that doesn't mean people have to manually write code
against it, in the same way that you can benefit from excellent processors
without writing assembly.


> Maybe a useful exercise for someone thinking about this issue
> would be to survey some of the major projects using the C API out
> there, and working out what would be involved in switching them to use
> cffi or Cython. That would give a good idea of the scale of the issue,
> as well as providing some practical help to projects that would be
> affected by this sort of recommendation.

My general answer is that "Python is way easier to write than C", and
therefore "rewriting C code in Cython" is a rather fast thing to do (P's
and C's set as intended). Often enough, the rewrite also leads to immediate
functional improvements because stuff can easily be done in a more general
way in Python syntax than in plain C(-API) code. And it's not uncommon that
several ref-counting and/or error handling bugs get fixed on the way.

When I rewrite C-API code in Cython, the bulk of the time is spent reverse
engineering the intended Python semantics from the verbose (and sometimes
cryptic) C code. After that, writing them down in Python syntax is quite
easy. Once you get used to it, the plain transformation can be done at more
than a hundred lines of C code per hour, if it's not overly complex or
dense (the usual 5%). If you have a good test suite, debugging the
rewritten code should be quite straight forward afterwards.

So, if you have a project with 10000 lines of C code, 30% of which uses the
C-API, you should be able to rip out the direct usage of the C-API in just
a couple of days by rewriting it in Cython. The code size usually drops by
a factor of 2-5 that way. That also makes it a reasonable migration path
for porting Py2.x C-API code to Py3, for example.

I can't speak for cffi, but my guess is that if you know its API well, the
fact that it's also Python should keep the rewriting speed in the same ball
park as for Cython. So, for code that isn't performance critical, it's
certainly a reasonable alternative, with the added benefit of having
excellent support in PyPy.


> Good ones to look at would be:
> - lxml

lxml has been written in Cython even before Cython existed (it used to be a
patched Pyrex at the time). In fact, writing it in C would have been
entirely impossible. Even if the necessary developer resources had been
available, writing C code is so difficult in comparison that many of the
non-trivial features would never have been implemented.


> (I refrained from adding scipy and numpy to that list, as that would
> make this post seem like a troll attempt, which it isn't, but has
> anyone thought of the implications of a recommendation like this on
> those projects? OK, they'd probably just ignore it as they have a
> genuine need for direct use of the C API, but we would be sending
> pretty mixed messages).

Much of scipy and its surrounding tools and libraries are actually written
in Cython. At least much of their parts that interact with Python, and
often a lot more than just the interface layer. New code in the scientific
computing community is commonly written in Cython these days, or uses other
tools for JIT or AOT compilation (Numba, numexpr, ...), many of which were
themselves partly written in Cython.

Stefan




More information about the Python-ideas mailing list