[Python-Dev] Slides from today's parallel/async Python talk

Trent Nelson trent at snakebite.org
Thu Mar 14 19:23:53 CET 2013


On Thu, Mar 14, 2013 at 05:21:09AM -0700, Christian Heimes wrote:
> Am 14.03.2013 03:05, schrieb Trent Nelson:
> >     Just posted the slides for those that didn't have the benefit of
> >     attending the language summit today:
> > 
> >         https://speakerdeck.com/trent/parallelizing-the-python-interpreter-an-alternate-approach-to-async
> 
> Wow, neat! Your idea with Py_PXCTC is ingenious.

    Yeah, it's funny how the viability and performance of the whole
    approach comes down to a quirky little trick for quickly detecting
    if we're in a parallel thread ;-)  I was very chuffed when it all
    fell into place.  (And I hope the quirkiness of it doesn't detract
    from the overall approach.)

> As far as I remember the FS and GS segment registers are used by most
> modern operating systems on x86 and x86_64 platforms nowadays to
> distinguish threads. TLS is implemented with FS and GS registers. I
> guess the __read[gf]sdword() intrinsics do exactly the same.

    Yup, in fact, if I hadn't come up with the __read[gf]sword() trick,
    my only other option would have been TLS (or the GetCurrentThreadId
    /pthread_self() approach in the presentation).  TLS is fantastic,
    and it's definitely an intrinsic part of the solution (the "Y" part
    of "if we're a parallel thread, do Y"), but it definitely more
    costly than a simple FS/GS register read.

> Reading
> registers is super fast and should have a negligible effect on code.

    Yeah the actual instruction is practically free; the main thing you
    pay for is the extra branch.  However, most of the code looks like
    this:

        if (Py_PXCTX)
            something_small_and_inlineable();
        else
            Py_INCREF(op); /* also small and inlineable */

    In the majority of the cases, all the code for both branches is
    going to be in the same cache line, so a mispredicted branch is
    only going to result in a pipeline stall, which is better than a
    cache miss.

> ARM CPUs don't have segment registers because they have a simpler
> addressing model. The register CP15 came up after a couple of Google
> searches.

    Noted, thanks!

> IMHO you should target x86, x86_64, ARMv6 and ARMv7. ARMv7 is going to
> be more important than x86 in the future. We are going to see more ARM
> based servers.

    Yeah that's my general sentiment too.  I'm definitely curious to see
    if other ISAs offer similar facilities (Sparc, IA64, POWER etc), but
    the hierarchy will be x86/x64 > ARM > * for the foreseeable future.

    Porting the Py_PXCTX part is trivial compared to the work that is
    going to be required to get this stuff working on POSIX where none
    of the sublime Windows concurrency, synchronisation and async IO
    primitives exist.

> Christian

        Trent.



More information about the Python-Dev mailing list