[Python-Dev] Use of Cython

Yury Selivanov yselivanov.ml at gmail.com
Tue Sep 4 12:19:49 EDT 2018


Hi Stefan,

On Sat, Sep 1, 2018 at 6:12 PM Stefan Behnel <stefan_ml at behnel.de> wrote:
>
> Yury,
>
> given that people are starting to quote enthusiastically the comments you
> made below, let me set a couple of things straight.

To everyone reading this thread please keep in mind that I'm not in
position to "defend" mypyc or to "promote" it, and I'm not affiliated
with the project at all.  I am just excited about yet another tool to
statically compile Python and I'm discussing it only from a
theoretical standpoint.

>
> Yury Selivanov schrieb am 07.08.2018 um 19:34:
> > On Mon, Aug 6, 2018 at 11:49 AM Ronald Oussoren via Python-Dev wrote:
> >
> >> I have no strong opinion on using Cython for tests or in the stdlib, other than that it is a fairly large dependency.  I do think that adding a “Cython-lite” tool the CPython distribution would be less ideal, creating and maintaining that tool would be a lot of work without clear benefits over just using Cython.
> >
> > Speaking of which, Dropbox is working on a new compiler they call "mypyc".
> >
> > mypyc will compile type-annotated Python code to an optimized C.
>
> That's their plan. Saying that "it will" is a bit premature at this point.
> The list of failed attempts at writing static Python compilers is rather
> long, even if you only count those that compile the usual "easy subset" of
> Python.
>
> I wish them the best of luck and endurance, but they have a long way to go.

I fully agree with you here.

>
>
> > The
> > first goal is to compile mypy with it to make it faster, so I hope
> > that the project will be completed.
>
> That's not "the first goal". It's the /only/ goal. The only intention of
> mypyc is to be able to compile and optimise enough of Python to speed up
> the kind or style of code that mypy uses.
>
>
> > Essentially, mypyc will be similar
> > to Cython, but mypyc is a *subset of Python*, not a superset.
>
> Which is bad, right? It means that there will be many things that simply
> don't work, and that you need to change your code in order to make it
> compile at all. Cython is way beyond that point by now. Even RPython will
> probably continue to be way better than mypyc for quite a while, maybe
> forever, who knows.

To be clear I'm not involved with mypyc, but my understanding is that
the entire Python syntax will be supported, except some dynamic
features like patching `globals()`, `locals()`, or classes, or
__class__.  IMO this is *good* and in general Python programs don't do
that anyways.

>
>
> > Interfacing with C libraries can be easily achieved with cffi.
>
> Except that it will be fairly slow. cffi is not designed for static
> analysis but for runtime operations.

Could you please clarify this point?  My current understanding is that
you can build a static compiler with a knowledge about cffi so that it
can compile calls like `ffi.new("something_t[]", 80)` to pure C.

> You can obviously also use cffi from
> Cython – but then, why would you, if you can get much faster code much more
> easily without using cffi?

The "much more easily" part is debatable here and is highly
subjective.  For me using Cython is also easier *at this point*
because I've spent so much time working with it. Although getting
there wasn't easy for me :(

>
> That being said, if someone wants to write a static cffi optimiser for
> Cython, why not, I'd be happy to help with my advice. The cool thing is
> that this can be improved gradually, because compiling the cffi code
> probably already works out of the box. It's just not (much) faster than
> when interpreted.

Yeah, statically compiling cffi-enabled code is probably the way to go
for mypyc and Cython.

>
>
> > Being a
> > strict subset of Python means that mypyc code will execute just fine
> > in PyPy.
>
> So does normal (non-subset) Python code. You can run it in PyPy, have
> CPython interpret it, or compile it with Cython if you want it to run
> faster in CPython, all without having to limit yourself to a subset of
> Python. Seriously, you make this sound like requiring users to rewrite
> their code to make it compilable with mypyc was a good thing.

But that's the point: unless you add Cython types to your Python code
it gets only moderate speedups.  Using Cython/C types usually means
that you need to use pxd/pyx files which means that the code isn't
Python anymore.  I know that Cython has a mode to use decorators in
pure Python code to annotate types, but they are less intuitive than
using typing annotations in 3.6+.

[..]
> > I'd be more willing to start using mypyc+cffi in CPython stdlib
> > *eventually*, than Cython now.  Cython is a relatively complex and
> > still poorly documented language.
>
> You are free to improve the documentation or otherwise help us find and
> discuss concrete problems with it.

Fair point.

> Calling Cython a "poorly documented
> language" could easily feel offensive towards those who have put a lot of
> work into the documentation, wiki, tutorials, trainings and what not that
> help people use the language. Even stack overflow is getting better and
> better in documenting Cython these days, even though responses over there
> that describe work-arounds tend to get outdated fairly quickly.

Didn't mean to offend anyone, sorry if I did.  I'm myself partly
responsible for poor asyncio docs and I know how it is to be on the
receiving end :(

[..]
> > I'm speaking from experience after
> > writing thousands of lines of Cython in uvloop & asyncpg.  In skillful
> > hands Cython is amazing, but I'd be cautious to advertise and use it
> > in CPython.
>
> Why not? You didn't actually give any reasons for that.

I've listed a couple:

(1) To get significant speedup one needs to learn a lot of new syntax.
For CPython it means that we'd have Python, C, and Cython to learn to
understand code written in Cython.  There's a very popular assumption
that you have to be proficient in C in order to become a CPython core
dev and people are genuinely surprised when I tell them that it's not
a requirement.  At the three conferences I've been this summer at
least 5 people complained to me that they didn't even consider
contributing to CPython because they don't know C. Adding yet another
language would simply raise this bar even higher, IMHO.

(2) My point about documentation still stands, even though I feel
extremely uncomfortable using it, sorry.

>
>
> > I'm also -1 on using Cython to test C API. While writing C tests is
> > annoying (I wrote a fair share myself), their very purpose is to make
> > third-party tools/extensions more stable. Using a third-party tool to
> > test C API to track regressions that break third-party tools feels
> > wrong.
>
> I don't understand that argument. What's wrong about using a tool that
> helps you get around writing boiler plate code? The actual testing does not
> need to be done by Cython at all, you can write it any way you like.

Because you don't have 100% control over how exactly Cython (or
different versions of it) will compile your code to C.  In my
experience writing a few C API tests in C is relatively easy compared
to introducing these new C APIs in the first place.

To summarize my personal position:

I'm -1 on using Cython to write C API tests/boilerplate in CPython.

I'm -1 on giving green light to use Cython's pxd/pyx syntaxes in CPython.

I'd be +0.5 on using Cython (optionally?) to compile some pure Python
code to make it 30-50% faster.  asyncio, for instance, would certainly
benefit from that.

Y


More information about the Python-Dev mailing list