[Python-ideas] async objects

Thu Oct 6 07:34:28 EDT 2016

On 6 October 2016 at 15:15, Stephen J. Turnbull
<turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
> Nick Coghlan writes:
>
>  > Python's core runtime model is the C runtime model: threads (with a
>  > local stack and access to a global process heap) and processes (which
>  > contain a heap and one or more threads). Anything else we do (whether
>  > it's generators, coroutines, or some other form of paused execution
>  > like callback management) gets layered on top of that runtime model.
>  > When folks ask questions like "Why can't Python be more like Go?",
>  > "Why can't Python be more like Erlang?", or "Why can't Python be more
>  > like Rust?" and get a negative response, it's usually because there's
>  > an inherent conflict between the C runtime model and whatever piece of
>  > the Go/Erlang/Rust runtime model we want to steal.
>
> How can there be a conflict between Python implementing the C runtime
> model *itself* which says "you can do anything anywhere anytime", and
> some part of Python implementing the more restricted models that allow
> safe concurrency?

Anything is possible in C, but not everything is readily supportable :)

When you design a new language and runtime from scratch, you get to
set new rules and expectations if you want to do that. Ericsson did it
with Erlang and BEAM (the reference Erlang VM) by declaring
"Everything's an Actor in the 'Actor Model' sense, and Actors can send
messages to each other's mailboxes". That pushes you heavily towards
application designs where each "process" is a Finite State Machine
with state changes triggered by external events, or by messages from
other processes. If BEAM had been published as open source a decade
earlier than it eventually was, I suspect the modern computing
landscape would look quite different from the way it does today.

Google did something similar with Golang and goroutines by declaring
that Communicating Sequential Processes would be their core
concurrency primitive rather than C's shared memory threading.

By contrast, Python, C++, Java, C#, Objective-C all retained C's core
thread-based "private stack, shared heap" concurrency model, which
later expanded to also include thread local heap storage. Rust
actually retains this core "private stack, private heap, shared heap"
model, but changes the management of data ownership to avoid the messy
problems that arise in practice when using the "everything is
accessible to every thread by default" model.

> If you can do anything, well, you can voluntarily
> submit to compiler discipline to a restricted set.  No?  So it must be
> that the existing constructions (functions, for, with) that need an
> "async" marker have an implementation that is itself unsafe.

Correct (for a given definition of unsafe): in normal operation,
CPython uses the *C stack* to manage the Python frame stack, so when
you descend into a new function call in CPython, you're also using up
more C level stack space. This means that when CPython throws
RecursionError, what it's actually aiming to prevent is a C level
segfault arising from running out of stack space to manage frames:

  $ ./python -X faulthandler
  Python 3.6.0b1+ (3.6:b995b1f52975, Sep 22 2016, 01:19:04)
  [GCC 6.1.1 20160621 (Red Hat 6.1.1-3)] on linux
  Type "help", "copyright", "credits" or "license" for more information.
  >>> def f(): f()
  ...
  >>> f()
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 1, in f
    File "<stdin>", line 1, in f
    File "<stdin>", line 1, in f
    [Previous line repeated 995 more times]
  RecursionError: maximum recursion depth exceeded
  >>> import sys
  >>> sys.setrecursionlimit(int(1e5))
  >>> f()
  Fatal Python error: Segmentation fault

  Current thread 0x00007fe977a7c700 (most recent call first):
    File "<stdin>", line 1 in f
    File "<stdin>", line 1 in f
    File "<stdin>", line 1 in f
    [<manual snip>]
    ...
  Segmentation fault (core dumped)

Loops, with statements and other magic method invocations all work
that way - they make a C level call to the magic method implementation
which may end up running a new invocation of the eval loop to evaluate
the bytecode of a magic method implementation that's written in
Python.

The pay-off that CPython gets from this is that we get to delegate
99.9% of the work for supporting different CPU architectures to C
compiler developers, and we get a lot of capabilities "for free" when
it comes to stack management.

The downside is that C runtimes don't officially support swapping out
the stack of the current thread with new contents. It's *possible* to
do that (hence Stackless and gevent), but you're on your own when it
comes to debugging it when it breaks.

That makes it a good candidate for an opt-in "expert users only"
capability - folks that decide gevent is the right answer for their
needs can adopt it if they want to (perhaps restricting their choice
of target platform and C extension modules as a result), while we (as
in the CPython core devs) don't need to keep custom stack manipulation
code working on all the platforms where CPython is supported and with
all the custom C extension modules that are out there.

>  This
> need is not being explained very well.  What is also not being
> explained is what would be lost by simply using the "safe"
> implementations generated by the async versions everywhere.

The two main problems with that idea are speed and extension module
compatibility.

The speed aspect is simply that we have more than 4 decades behind us
of CPU designers and compiler developers making C code run fast.
CPython uses that raw underlying speed to offer a lot of runtime
flexibility with a relatively simple implementation while still being
"fast enough" for many use cases. Even then, function calls are still
notoriously slow, and await invocations tend to be slower still.

The extension module compatibility problem is simply that whereas you
can emulate a normal Python function just by writing a normal C
function, emulating a Python coroutine involves implementing the
coroutine protocol. That's possible, but it's a lot more complicated,
and even if you implemented a standard wrapper, you'd be straight back
to the speed problem.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia