[Python-Dev] pyparallel and new memory API discussions...
Trent Nelson
trent at snakebite.org
Wed Jun 19 16:33:10 CEST 2013
Hi Charles-François!
Good to hear from you again. It was actually your e-mail a few
months ago that acted as the initial catalyst for this memory
protection idea, so, thanks for that :-)
Answer below.
On Wed, Jun 19, 2013 at 07:01:49AM -0700, Charles-François Natali wrote:
> 2013/6/19 Trent Nelson <trent at snakebite.org>:
> >
> > The new memory API discussions (and PEP) warrant a quick pyparallel
> > update: a couple of weeks after PyCon, I came up with a solution for
> > the biggest show-stopper that has been plaguing pyparallel since its
> > inception: being able to detect the modification of "main thread"
> > Python objects from within a parallel context.
> >
> > For example, `data.append(4)` in the example below will generate an
> > AssignmentError exception, because data is a main thread object, and
> > `data.append(4)` gets executed from within a parallel context::
> >
> > data = [ 1, 2, 3 ]
> >
> > def work():
> > data.append(4)
> >
> > async.submit_work(work)
> >
> > The solution turned out to be deceptively simple:
> >
> > 1. Prior to running parallel threads, lock all "main thread"
> > memory pages as read-only (via VirtualProtect on Windows,
> > mprotect on POSIX).
> >
> > 2. Detect attempts to write to main thread pages during parallel
> > thread execution (via SEH on Windows or a SIGSEGV trap on POSIX),
> > and raise an exception instead (detection is done in the ceval
> > frame exec loop).
>
> Quick stupid question: because of refcounts, the pages will be written
> to even in case of read-only access. How do you deal with this?
Easy: I don't refcount in parallel contexts :-)
There's no need, for two reasons:
1. All memory allocated in a parallel context is localized to a
private heap. When the parallel context is finished, the entire
heap can be blown away in one fell-swoop. There's no need for
reference counting or GC because none of the objects will exist
after the parallel context completes.
2. The main thread won't be running when parallel threads/contexts
are executing, which means main thread objects being accessed in
parallel contexts (read-only access is fine) won't be suddenly
free()'d or GC-collected or whatever.
You get credit for that second point; you asked a similar question a
few months ago that made me realize I absolutely couldn't have the
main thread running at the same time the parallel threads were
running.
Once I accepted that as a design constraint, everything else came
together nicely... "Hmmm, if the main thread isn't running, it won't
need write-access to any of its pages! If we mark them read-only,
we could catch the traps/SEHs from parallel threads, then raise an
exception, ahh, simple!".
I'm both chuffed at how simple it is (considering it was *the* major
show-stopper), and miffed at how it managed to elude me for so long
;-)
Regards,
Trent.
More information about the Python-Dev
mailing list