[Python-Dev] Use of Cython

Stefan Behnel stefan_ml at behnel.de
Tue Sep 4 14:55:56 EDT 2018


Yury Selivanov schrieb am 04.09.2018 um 18:19:
> On Sat, Sep 1, 2018 at 6:12 PM Stefan Behnel wrote:
>> Yury Selivanov schrieb am 07.08.2018 um 19:34:
>>> The first goal is to compile mypy with it to make it faster, so I hope
>>> that the project will be completed.
>>
>> That's not "the first goal". It's the /only/ goal. The only intention of
>> mypyc is to be able to compile and optimise enough of Python to speed up
>> the kind or style of code that mypy uses.
>>
>>> Essentially, mypyc will be similar
>>> to Cython, but mypyc is a *subset of Python*, not a superset.
>>
>> Which is bad, right? It means that there will be many things that simply
>> don't work, and that you need to change your code in order to make it
>> compile at all. Cython is way beyond that point by now. Even RPython will
>> probably continue to be way better than mypyc for quite a while, maybe
>> forever, who knows.
> 
> To be clear I'm not involved with mypyc, but my understanding is that
> the entire Python syntax will be supported, except some dynamic
> features like patching `globals()`, `locals()`, or classes, or
> __class__.

No, that's not the goal, at least from what I understood from my
discussions with Jukka. The goal is to make it compile mypy, be it by
supporting Python features in mypyc or by avoiding Python features in mypy.
I'm sure they will take any shortcut they can in order to avoid having to
make mypyc too capable, because mypyc is not more than a means to an end.
For example, they may easily get away without supporting generators and
closures, which are quite difficult to implement in C. But finding a
non-trivial piece of Python code out there that uses neither of the two is
probably not easy.

I'm also sure they will avoid Python semantics wherever they can, because
implementing them in the same way as CPython and Cython would mean that
certain constructs cannot safely be statically reasoned about, and thus
cannot be optimised. Avoiding (full) Python semantics relieves you from
these restrictions, and if you control both sides, the compiler and the
code that it compiles, then it becomes much easier to apply arbitrary
optimisations at will.

IMHO, what they are implementing is much closer to ShedSkin than to Cython.


>>> Interfacing with C libraries can be easily achieved with cffi.
>>
>> Except that it will be fairly slow. cffi is not designed for static
>> analysis but for runtime operations.
> 
> Could you please clarify this point?  My current understanding is that
> you can build a static compiler with a knowledge about cffi so that it
> can compile calls like `ffi.new("something_t[]", 80)` to pure C.

I'm sure there is a relatively large subset of cffi's API that could be
compiled statically, as long as the declartions and their usage are kept
simple and fully visible to the compiler. What that subset is remains to be
seen once someone actually tries to do it.


> Yeah, statically compiling cffi-enabled code is probably the way to go
> for mypyc and Cython.

I doubt it, given the expected restrictions and verbosity. But debating
this is useless as long as no-one attempts to actually write a static
compiler for cffi(-like) code.


> Using Cython/C types usually means
> that you need to use pxd/pyx files which means that the code isn't
> Python anymore.

I'm aware that this is a very common misconception that is difficult to get
out of people's heads. You probably got this idea from wrapping a native
library, in which case the only choice you have in order to declare an
external C-API is really to use Cython's special syntax. However, this
would not apply to most use cases in the CPython project context, and it
also does not necessarily apply to most of the code in a Cython module even
if it uses external libraries.

Cython has four ways to provide type declarations: cdef statements in
Cython code, external .pxd files for Python or Cython files, special
decorators and declaration functions, and PEP-484/526 type annotations.

All four have their use cases (e.g. syntax support in different Python
versions, efficiency of expression, readability for people with different
backgrounds, etc.), and all but the first allow users to keep their module
code in Python syntax. As long as you do not call into external native
code, it's your choice which of these you prefer for your code base,
project context and developer background. You can even mix them at will, if
you feel like it.


> I know that Cython has a mode to use decorators in
> pure Python code to annotate types, but they are less intuitive than
> using typing annotations in 3.6+.

You can use PEP-484/526 type annotations to declare Cython types in Python
code that you intend to compile. It's entirely up to you, and it's an
entirely subjective measure which "is better". Many people prefer Cython's
non-Python syntax because it allows them to apply their existing C
knowledge. For them, PEP-484 annotations may easily be non-intuitive in
comparison.


> For CPython it means that we'd have Python, C, and Cython to learn to
> understand code written in Cython.  There's a very popular assumption
> that you have to be proficient in C in order to become a CPython core
> dev and people are genuinely surprised when I tell them that it's not
> a requirement.  At the three conferences I've been this summer at
> least 5 people complained to me that they didn't even consider
> contributing to CPython because they don't know C. Adding yet another
> language would simply raise this bar even higher, IMHO.

Adding the right language would lower the bar, IMHO. Cython is Python. It
allows users with a Python background to implement C things without having
to thoroughly learn C /and/ the CPython C-API first. So, the way I see it,
rather than /adding/ a "third" language to the mix, it substantially lowers
the entry level from the current two and a half languages (Python + C +
C-API) to one and a half (Python + Cython).


> I'd be +0.5 on using Cython (optionally?) to compile some pure Python
> code to make it 30-50% faster.  asyncio, for instance, would certainly
> benefit from that.

Since most of this (stdlib) Python code doesn't need to stay syntax
compatible with Python < 3.6 (actually 3.8) anymore, you can probably get
much higher speedups than that by statically typing some variables and
functions here and there. I recently tried that with difflib, makes a big
difference.

Stefan



More information about the Python-Dev mailing list