[Python-Dev] Slides from today's parallel/async Python talk
trent at snakebite.org
Thu Mar 14 19:23:53 CET 2013
On Thu, Mar 14, 2013 at 05:21:09AM -0700, Christian Heimes wrote:
> Am 14.03.2013 03:05, schrieb Trent Nelson:
> > Just posted the slides for those that didn't have the benefit of
> > attending the language summit today:
> > https://speakerdeck.com/trent/parallelizing-the-python-interpreter-an-alternate-approach-to-async
> Wow, neat! Your idea with Py_PXCTC is ingenious.
Yeah, it's funny how the viability and performance of the whole
approach comes down to a quirky little trick for quickly detecting
if we're in a parallel thread ;-) I was very chuffed when it all
fell into place. (And I hope the quirkiness of it doesn't detract
from the overall approach.)
> As far as I remember the FS and GS segment registers are used by most
> modern operating systems on x86 and x86_64 platforms nowadays to
> distinguish threads. TLS is implemented with FS and GS registers. I
> guess the __read[gf]sdword() intrinsics do exactly the same.
Yup, in fact, if I hadn't come up with the __read[gf]sword() trick,
my only other option would have been TLS (or the GetCurrentThreadId
/pthread_self() approach in the presentation). TLS is fantastic,
and it's definitely an intrinsic part of the solution (the "Y" part
of "if we're a parallel thread, do Y"), but it definitely more
costly than a simple FS/GS register read.
> registers is super fast and should have a negligible effect on code.
Yeah the actual instruction is practically free; the main thing you
pay for is the extra branch. However, most of the code looks like
Py_INCREF(op); /* also small and inlineable */
In the majority of the cases, all the code for both branches is
going to be in the same cache line, so a mispredicted branch is
only going to result in a pipeline stall, which is better than a
> ARM CPUs don't have segment registers because they have a simpler
> addressing model. The register CP15 came up after a couple of Google
> IMHO you should target x86, x86_64, ARMv6 and ARMv7. ARMv7 is going to
> be more important than x86 in the future. We are going to see more ARM
> based servers.
Yeah that's my general sentiment too. I'm definitely curious to see
if other ISAs offer similar facilities (Sparc, IA64, POWER etc), but
the hierarchy will be x86/x64 > ARM > * for the foreseeable future.
Porting the Py_PXCTX part is trivial compared to the work that is
going to be required to get this stuff working on POSIX where none
of the sublime Windows concurrency, synchronisation and async IO
More information about the Python-Dev