[Python-ideas] [Python-Dev] PyParallel: alternate async I/O and GIL removal

Sun Nov 17 03:56:11 CET 2013

On Sat, Nov 16, 2013 at 6:24 PM, Trent Nelson <trent at snakebite.org> wrote:

> On Sat, Nov 16, 2013 at 05:39:13PM -0800, Guido van Rossum wrote:
>

[snip]

>    Finally, I'm not sure why you are so confrontational about the way
> Twisted
> >    and Tulip do things. We are doing things the only way they *can* be
> done
> >    without overhauling the entire CPython implementation (which you have
> >    proven will take several major release cycles, probably until 4.0).
> It's
> >    fine that you are looking further forward than most of us. I don't
> think
> >    it makes sense that you are blaming the rest of us for writing
> libraries
> >    that can be used today.
>
>     I watched the video today; there's a point where I say something
>     along the lines of "that's not how you should do IOCP; they're
>     doing it wrong".  That definitely came out wrong -- when limited
>     to a single-threaded execution model, which today's Python is, then
>     calling GetQueuedCompletionStatus() in a single-threaded event loop
>     is really the only option you have.
>
>     (I think I also say "that's just as bad as select()"; I didn't mean
>      that either -- it's definitely better than select() when you're
>      limited to the single-threaded execution model.  What I was trying
>      to convey was that doing it like that wasn't really how IOCP was
>      designed to be used -- which is why I dig into the intrinsic link
>      between IOCP, async I/O and threading for so many slides.)
>

I wish you had spent more time on explaining how IOCP works and less on
judging other approaches.

Summarizing my understanding of what you're saying, it seems the "right"
way to use IOCP on a multi-core machine is to have one thread per core
(barring threads you need for unavoidably blocking stuff) and to let the
kernel schedule callbacks on all those threads. As long as the callbacks
don't block and events come in at a rate to keep all those cores busy this
will be optimal.

But this is almost tautological. It only works if the threads don't
communicate with each other or with the main thread (all shared data must
be read-only). But heh, if that's all, one process per core works just as
well. :-)

I don't really care how well CHARGEN (I had to look it up) scales. For
HTTP, it's great for serving static contents from a cache or from the
filesystem, but if that's all you serve, why use Python? Real web apps use
intricate combinations of databases, memcache, in-memory cache, and
template expansion. The biggest difference you can make there is probably
getting rid of the ORM in favor of more direct SQL, and next on the list
would be reimplementing template expansion in C. (And heck, you could
release the GIL while you're doing that. :-)

    And in hindsight, perhaps I need to put more emphasis on the fact
>     that it *is* very experimental work with a long-term view, versus
>     Tulip/asyncio, which was intended for *now*.  So although Tulip and
>     PyParallel spawned from the same discussions and are attempting to
>     attack the same problem -- it's really not fair for me to discredit
>     Tulip/Twisted in favor of PyParallel because they're on completely
>     different playing fields with vastly different implementation time
>     frames (I'm thinking 5+ years before this work lands in a mainstream
>     Python release -- if it ever does.  And if not, hey, it can live on
>     as another interpreter, just like Stackless et al).
>

I would love it if you could write a list of things a callback *cannot* do
when it is in parallel mode. I believe that list includes mutating any kind
of global/shared state (any object created in the main thread is read-only
in parallel mode -- it seems you had to work hard to make string interning
work, which is semantically transparent but mutates hidden global state).
In addition (or, more likely, as a consequence!) a callback cannot create
anything that lasts beyond the callback's lifetime, except for the brief
time between the callback's return and the completion of the I/O operation
involving the return value. (Actually, I missed how you do this -- doesn't
this mean you cannot release the callback's heap until much later?)

So it seems that the price for extreme concurrency is the same as always --
you can only run purely functional code. Haskell fans won't mind, but for
Python this seems to be putting the cart before the horse -- who wants to
write Python with those constraints?

[snip]

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131116/91cf995f/attachment.html>