[pypy-dev] PyParallel-style threads

Mon Jun 20 13:53:09 EDT 2016

PyParallel defines "not mutating global state" as *"avoiding mutation of
Python objects that were allocated from the main thread; don't append to a
main thread list or assign to a main thread dict from a parallel thread"*.

The PyParallel approach provides different tradeoffs from STM.
You can't parallelize desrialization of a dictionary to a Python object
instance e.g. a Django model but you can run a threaded server that
performs parallel I/O since in STM performing I/O turns the transaction to
be inevitable. There can only be one inevitable transaction at any given
point of time according to the documentation found here
http://doc.pypy.org/en/latest/stm.html#transaction-transactionqueue.
Also, I'm not sure how allowing to perform a single I/O operation in PyPy
STM will affect gevent/eventlet or asyncio if more than one thread is
involved (which is supported in both gevent and asyncio. I haven't used
eventlet so I don't really know).
The PyParallel approach offers the same semantics as CPython when it comes
to gevent/asyncio/eventlet. Each thread has it's own event loop and you are
allowed to switch execution in the middle since you're not changing
anything from other threads.

You can also report errors to Sentry using raven while handling other
requests normally. Raven collects stack information which is never mutated
(See
https://github.com/getsentry/raven-python/blob/master/raven/utils/stacks.py#L246)
and then sends it to Sentry's servers. There's no reason (that I can see at
least) to block another request from being processed while collecting that
information and sending the data to Sentry's servers.

The usecase described by PyParallel is also valid:

"...This is significant when you factor in how Python's scoping works at a
language level: Python code executing in a parallel thread can freely
access any non-local variables created by the "main thread". That is, it
has the exact same scoping and variable name resolution rules as any other
Python code. This facilitates loading large data structures from the main
thread and then freely accessing them from parallel callbacks.

We demonstrate this with our simple Wikipedia "instant search" server
<https://github.com/pyparallel/pyparallel/blob/branches/3.3-px/examples/wiki/wiki.py#L294>,
which loads a trie with 27 million entries, each one mapping a title to a
64-bit byte offset within a 60GB XML file. We then load a sorted NumPy
array of all 64-bit offsets, which allows us to extract the exact byte
range a given title's content appears within the XML file, allowing a
client to issue a ranged request for those bytes to get the exact content
via a single call to TransmitFile. This call returns immediately, but sets
up the necessary structures for the kernel to send that byte range directly
to the client without further interaction from us.

The working set size of the python.exe process is about 11GB when the trie
and NumPy array are loaded. Thus,multiprocessing would not be feasible, as
you'd have 8 separate processes of 11GB if you had 8 cores and started 8
workers, requiring 88GB just for the processes. The number of allocated
objects is around 27.1 million; the datrie library can efficiently store
values if they're a 32-bit integer, however, our offsets are 64-bit, so an
80-something byte PyObjectneeds to be allocated to represent each one.

This is significant because it demonstrates the competitive advantage
PyParallel has against other languages when dealing with large heap sizes
and object counts, whilst simultaneously avoiding the need for continual
GC-motivated heap traversal, a product of memory allocation pressure (which
is an inevitable side-effect of high-end network load, where incoming links
are saturated at line rate)."

STM currently requires code modifications in order to avoid conflicts, at
least when collections are involved. PyParallel doesn't allow these kinds
of mutations so it makes the implementation much easier in PyPy.
PyParallel also requires a specific API to be used in order to utilize
their parallel threads. There is a way to eliminate code modifications in
PyParallel's case.
We initially run with the GIL acquired as in with any other thread and then
the trace for CPyExt calls or non-thread locals mutations and if there are
none we can eliminate the call to acquire the GIL. Further optimizations
can be performed if only a branch of the code requires CPyExt/non-thread
locals mutations.
I don't know if it's any easier than scanning the trace for
lists/sets/dictionaries and replacing them with their equivalent STM
implementations which Armin has already mentioned is not trivial.
In the future when STM will be production ready we can "downgrade" a thread
to an STM thread when it is required instead of acquiring the GIL and
blocking the execution of other threads if we want to.

STM also currently makes it harder to reason on how the program behaves.
Especially when you have conflicts.
With my suggestion you can easily say if the GIL is released or not.
‫בתאריך יום ב׳, 20 ביוני 2016 ב-17:53 מאת ‪Armin Rigo‬‏ <‪arigo at tunes.org
‬‏>:‬

> Hi Omer,
>
> On 20 June 2016 at 08:51, Omer Katz <omer.drow at gmail.com> wrote:
> > As for implementation, if we can trace the code running in the thread and
> > ensure it's not mutating global state and that CPyExt is never used
> during
> > the thread's course we can simply release the GIL when such a thread is
> run.
>
> That's a very hand-wavy and vague description.  To start with, how do
> you define exactly "not mutating global state"?  We are not allowed to
> write to any of the objects that existed before we started the thread?
>  It may be possible to have such an implementation, yes.  Actually,
> that's probably easy: tweak the STM code to crash instead of doing
> something more complicated when we write to an old object.
>
> I'm not sure how useful that would be---or how useful PyParallel is on
> CPython.  Maybe if you can point us to real usages of PyParallel it
> would be a start.
>
>
> A bientôt,
>
> Armin.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20160620/b5db4b65/attachment.html>