[pypy-dev] PyParallel-style threads

Tue Jun 21 01:34:13 EDT 2016

CSP is something you can already implement with Python.
In fact, that's exactly what happens when one uses Python threads with
coroutines (such as gevent). I'm not sure what your suggesting and how
would they keep Python semantics.
The limitations of the GIL will prevent CSP style concurrency from actually
performing as well as Go.
What I'm suggesting will relex the limitations of the GIL without changing
semantics or requiring a battle tested STM implementation for the time
being.
The use cases I described will benefit from having threads working.
Applications like Salt use a lot of threads. If we can run some of them in
parallel without changing code, that's a huge win in my book.

On Mon, Jun 20, 2016, 22:56 Michał Domański <mdomans at gmail.com> wrote:

> With all due respect - wouldn't it make more sense to agree on API that in
> fact would use threads? The patterns used by Go and Erlang and to some
> degree by ObjC and Swift seem more promising than "hidden multiprocessing"
>
> I may be a bore, but if what I'm getting is just "a nice syntax with
> restrictions" - it's not worth working on. I'd like to see actual benefits
> to people that want multithreading to work. It may seem like a blasphemy
> but with PyPy we could agree on a new APIs
>
> 2016-06-20 19:53 GMT+02:00 Omer Katz <omer.drow at gmail.com>:
>
>> PyParallel defines "not mutating global state" as *"avoiding mutation of
>> Python objects that were allocated from the main thread; don't append to a
>> main thread list or assign to a main thread dict from a parallel thread"*
>> .
>>
>> The PyParallel approach provides different tradeoffs from STM.
>> You can't parallelize desrialization of a dictionary to a Python object
>> instance e.g. a Django model but you can run a threaded server that
>> performs parallel I/O since in STM performing I/O turns the transaction to
>> be inevitable. There can only be one inevitable transaction at any given
>> point of time according to the documentation found here
>> http://doc.pypy.org/en/latest/stm.html#transaction-transactionqueue.
>> Also, I'm not sure how allowing to perform a single I/O operation in PyPy
>> STM will affect gevent/eventlet or asyncio if more than one thread is
>> involved (which is supported in both gevent and asyncio. I haven't used
>> eventlet so I don't really know).
>> The PyParallel approach offers the same semantics as CPython when it
>> comes to gevent/asyncio/eventlet. Each thread has it's own event loop and
>> you are allowed to switch execution in the middle since you're not changing
>> anything from other threads.
>>
>> You can also report errors to Sentry using raven while handling other
>> requests normally. Raven collects stack information which is never mutated
>> (See
>> https://github.com/getsentry/raven-python/blob/master/raven/utils/stacks.py#L246)
>> and then sends it to Sentry's servers. There's no reason (that I can see at
>> least) to block another request from being processed while collecting that
>> information and sending the data to Sentry's servers.
>>
>> The usecase described by PyParallel is also valid:
>>
>> "...This is significant when you factor in how Python's scoping works at
>> a language level: Python code executing in a parallel thread can freely
>> access any non-local variables created by the "main thread". That is, it
>> has the exact same scoping and variable name resolution rules as any other
>> Python code. This facilitates loading large data structures from the main
>> thread and then freely accessing them from parallel callbacks.
>>
>> We demonstrate this with our simple Wikipedia "instant search" server
>> <https://github.com/pyparallel/pyparallel/blob/branches/3.3-px/examples/wiki/wiki.py#L294>,
>> which loads a trie with 27 million entries, each one mapping a title to a
>> 64-bit byte offset within a 60GB XML file. We then load a sorted NumPy
>> array of all 64-bit offsets, which allows us to extract the exact byte
>> range a given title's content appears within the XML file, allowing a
>> client to issue a ranged request for those bytes to get the exact content
>> via a single call to TransmitFile. This call returns immediately, but
>> sets up the necessary structures for the kernel to send that byte range
>> directly to the client without further interaction from us.
>>
>> The working set size of the python.exe process is about 11GB when the
>> trie and NumPy array are loaded. Thus,multiprocessing would not be
>> feasible, as you'd have 8 separate processes of 11GB if you had 8 cores and
>> started 8 workers, requiring 88GB just for the processes. The number of
>> allocated objects is around 27.1 million; the datrie library can
>> efficiently store values if they're a 32-bit integer, however, our offsets
>> are 64-bit, so an 80-something byte PyObjectneeds to be allocated to
>> represent each one.
>>
>> This is significant because it demonstrates the competitive advantage
>> PyParallel has against other languages when dealing with large heap sizes
>> and object counts, whilst simultaneously avoiding the need for continual
>> GC-motivated heap traversal, a product of memory allocation pressure (which
>> is an inevitable side-effect of high-end network load, where incoming links
>> are saturated at line rate)."
>>
>> STM currently requires code modifications in order to avoid conflicts, at
>> least when collections are involved. PyParallel doesn't allow these kinds
>> of mutations so it makes the implementation much easier in PyPy.
>> PyParallel also requires a specific API to be used in order to utilize
>> their parallel threads. There is a way to eliminate code modifications in
>> PyParallel's case.
>> We initially run with the GIL acquired as in with any other thread and
>> then the trace for CPyExt calls or non-thread locals mutations and if there
>> are none we can eliminate the call to acquire the GIL. Further
>> optimizations can be performed if only a branch of the code requires
>> CPyExt/non-thread locals mutations.
>> I don't know if it's any easier than scanning the trace for
>> lists/sets/dictionaries and replacing them with their equivalent STM
>> implementations which Armin has already mentioned is not trivial.
>> In the future when STM will be production ready we can "downgrade" a
>> thread to an STM thread when it is required instead of acquiring the GIL
>> and blocking the execution of other threads if we want to.
>>
>> STM also currently makes it harder to reason on how the program behaves.
>> Especially when you have conflicts.
>> With my suggestion you can easily say if the GIL is released or not.
>> ‫בתאריך יום ב׳, 20 ביוני 2016 ב-17:53 מאת ‪Armin Rigo‬‏ <‪arigo at tunes.org
>> ‬‏>:‬
>>
>>> Hi Omer,
>>>
>>> On 20 June 2016 at 08:51, Omer Katz <omer.drow at gmail.com> wrote:
>>> > As for implementation, if we can trace the code running in the thread
>>> and
>>> > ensure it's not mutating global state and that CPyExt is never used
>>> during
>>> > the thread's course we can simply release the GIL when such a thread
>>> is run.
>>>
>>> That's a very hand-wavy and vague description.  To start with, how do
>>> you define exactly "not mutating global state"?  We are not allowed to
>>> write to any of the objects that existed before we started the thread?
>>>  It may be possible to have such an implementation, yes.  Actually,
>>> that's probably easy: tweak the STM code to crash instead of doing
>>> something more complicated when we write to an old object.
>>>
>>> I'm not sure how useful that would be---or how useful PyParallel is on
>>> CPython.  Maybe if you can point us to real usages of PyParallel it
>>> would be a start.
>>>
>>>
>>> A bientôt,
>>>
>>> Armin.
>>>
>>
>> _______________________________________________
>> pypy-dev mailing list
>> pypy-dev at python.org
>> https://mail.python.org/mailman/listinfo/pypy-dev
>>
>>
>
>
> --
> ---------------------------
> Michał Domański
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20160621/464502c8/attachment.html>