[Python-ideas] PyParallel update (was: solving multi-core Python)

Tue Jun 23 15:53:01 CEST 2015

On Sat, Jun 20, 2015 at 03:42:33PM -0600, Eric Snow wrote:
> Furthermore, removing the GIL is perhaps an obvious solution but not
> the only one.  Others include Trent Nelson's PyParallels, STM, and
> other Python implementations..

So, I've been sprinting relentlessly on PyParallel since Christmas, and
recently reached my v0.0 milestone of being able to handle all the TEFB
tests, plus get the "instantaneous wiki search" thing working too.

The TEFB (Techempower Framework Benchmarks) implementation is here:
    https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/examples/tefb/tefb.py?at=3.3-px
    (The aim was to have it compete in this: https://www.techempower.com/benchmarks/#section=data-r10, but unfortunately they broke their Windows support after round 9, so there's no way to get PyParallel into the official results without fixing that first.)

The wiki thing is here:

https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/examples/wiki/wiki.py?at=3.3-px

I particularly like the wiki example as it leverages a lot of benefits
afforded by PyParallel's approach to parallelism, concurrency and
asynchronous I/O:
    - Load a digital search trie (datrie.Trie) that contains every
      Wikipedia title and the byte-offset within the wiki.xml where
      the title was found.  (Once loaded the RSS of python.exe is about
      11GB; the trie itself has about 16 million items in it.)
    - Load a numpy array of sorted 64-bit integer offsets.  This allows
      us to do a searchsorted() (binary search) against a given offset
      in order to derive the next offset.
    - Once we have a way of getting two byte offsets, we can use ranged
      HTTP requests (and TransmitFile behind the scenes) to efficiently
      read random chunks of the file asynchronously.  (Windows has a
      huge advantage here -- there's simply no way to achieve similar
      functionality on POSIX in a non-blocking fashion (sendfile can
      block, a disk read() can block, a memory reference into a mmap'd
      file that isn't in memory will page fault, which will block).)

The performance has far surpassed anything I could have imagined back
during the async I/O discussions in September 2012, so, time to stick a
fork in it and document the experience, which is what I'll be working on
in the coming weeks.

In the mean time:
    - There are installers available here for those that wish to play
      around with the current state of things:
        http://download.pyparallel.org/
    - I wrote a little helper thing that diffs the hg tree against the
      original v3.3.5 tag I based the work off and committed the diffs
      directly -- this provides a way to review the changes that were
      made in order to get to the current level of functionality:
          https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/diffs/?at=3.3-px
      (It only includes files that existed in the v3.3.5 tag, I don't
       include diffs for new files I've added.)

It's probably useful reviewing the diffs after perusing pyparallel.h:
    https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/Include/pyparallel.h?at=3.3-px#cl-345

....as you'll see lots of guards in place in most of the diffs. E.g.:

Py_GUARD()  -- make sure we never hit this from a parallel context
Px_GUARD()  -- make sure we never hit this from a main thread
Py_GUARD_OBJ(o) -- make sure object o is always a main thread object
Px_GUARD_OBJ(o) -- make sure object o is always a parallel object
PyPx_GUARD_OBJ(o) -- if we're a parallel context, make sure it's a
                     parallel object, if we're a main thread, make
                     sure it's a main thread object.

If you haven't heard of PyParallel before, this might be a good place to
start: https://speakerdeck.com/trent/.

The core concepts haven't really changed since here (re: parallel
contexts, main thread, main thread objects, parallel thread objects):

    https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploited-all-cores?slide=91

Basically, if we're a main thread, "do what we normally do", if we're a
parallel thread, "divert to a thread-safe alternative".

And a final note: I like the recent async additions.  I mean, it's
unfortunate that the new keyword clashes with the module name I used to
hide all the PyParallel trickery, but I'm at the point now where calling
something like this from within a parallel context is exactly what I
need:
    async f.write(...)
    async cursor.execute(...)

I've been working on PyParallel on-and-off now for ~2.5 years and have
learned a lot and churned out a lot of code -- documenting it all is
actually somewhat daunting (where do I start?!), so, if anyone has
specific questions about how I addressed certain things, I'm more than
happy to elicit more detail on specifics.

    Trent.