[Python-Dev] pyparallel and new memory API discussions...

Wed Jun 19 16:33:10 CEST 2013

Hi Charles-François!

    Good to hear from you again.  It was actually your e-mail a few
    months ago that acted as the initial catalyst for this memory
    protection idea, so, thanks for that :-)

    Answer below.

On Wed, Jun 19, 2013 at 07:01:49AM -0700, Charles-François Natali wrote:
> 2013/6/19 Trent Nelson <trent at snakebite.org>:
> >
> >     The new memory API discussions (and PEP) warrant a quick pyparallel
> >     update: a couple of weeks after PyCon, I came up with a solution for
> >     the biggest show-stopper that has been plaguing pyparallel since its
> >     inception: being able to detect the modification of "main thread"
> >     Python objects from within a parallel context.
> >
> >     For example, `data.append(4)` in the example below will generate an
> >     AssignmentError exception, because data is a main thread object, and
> >     `data.append(4)` gets executed from within a parallel context::
> >
> >         data = [ 1, 2, 3 ]
> >
> >         def work():
> >             data.append(4)
> >
> >         async.submit_work(work)
> >
> >     The solution turned out to be deceptively simple:
> >
> >       1.  Prior to running parallel threads, lock all "main thread"
> >           memory pages as read-only (via VirtualProtect on Windows,
> >           mprotect on POSIX).
> >
> >       2.  Detect attempts to write to main thread pages during parallel
> >           thread execution (via SEH on Windows or a SIGSEGV trap on POSIX),
> >           and raise an exception instead (detection is done in the ceval
> >           frame exec loop).
> 
> Quick stupid question: because of refcounts, the pages will be written
> to even in case of read-only access. How do you deal with this?

    Easy: I don't refcount in parallel contexts :-)

    There's no need, for two reasons:

     1. All memory allocated in a parallel context is localized to a
        private heap.  When the parallel context is finished, the entire
        heap can be blown away in one fell-swoop.  There's no need for
        reference counting or GC because none of the objects will exist
        after the parallel context completes.

     2. The main thread won't be running when parallel threads/contexts
        are executing, which means main thread objects being accessed in
        parallel contexts (read-only access is fine) won't be suddenly
        free()'d or GC-collected or whatever.

    You get credit for that second point; you asked a similar question a
    few months ago that made me realize I absolutely couldn't have the
    main thread running at the same time the parallel threads were
    running.

    Once I accepted that as a design constraint, everything else came
    together nicely... "Hmmm, if the main thread isn't running, it won't
    need write-access to any of its pages!  If we mark them read-only,
    we could catch the traps/SEHs from parallel threads, then raise an
    exception, ahh, simple!".

    I'm both chuffed at how simple it is (considering it was *the* major
    show-stopper), and miffed at how it managed to elude me for so long
    ;-)

    Regards,

        Trent.