CSP is something you can already implement with Python.
In fact, that's exactly what happens when one uses Python threads with coroutines (such as gevent). I'm not sure what your suggesting and how would they keep Python semantics.
The limitations of the GIL will prevent CSP style concurrency from actually performing as well as Go.
What I'm suggesting will relex the limitations of the GIL without changing semantics or requiring a battle tested STM implementation for the time being.
The use cases I described will benefit from having threads working.
Applications like Salt use a lot of threads. If we can run some of them in parallel without changing code, that's a huge win in my book.


On Mon, Jun 20, 2016, 22:56 Michał Domański <mdomans@gmail.com> wrote:
With all due respect - wouldn't it make more sense to agree on API that in fact would use threads? The patterns used by Go and Erlang and to some degree by ObjC and Swift seem more promising than "hidden multiprocessing"

I may be a bore, but if what I'm getting is just "a nice syntax with restrictions" - it's not worth working on. I'd like to see actual benefits to people that want multithreading to work. It may seem like a blasphemy but with PyPy we could agree on a new APIs

2016-06-20 19:53 GMT+02:00 Omer Katz <omer.drow@gmail.com>:
PyParallel defines "not mutating global state" as "avoiding mutation of Python objects that were allocated from the main thread; don't append to a main thread list or assign to a main thread dict from a parallel thread".

The PyParallel approach provides different tradeoffs from STM.
You can't parallelize desrialization of a dictionary to a Python object instance e.g. a Django model but you can run a threaded server that performs parallel I/O since in STM performing I/O turns the transaction to be inevitable. There can only be one inevitable transaction at any given point of time according to the documentation found here http://doc.pypy.org/en/latest/stm.html#transaction-transactionqueue.
Also, I'm not sure how allowing to perform a single I/O operation in PyPy STM will affect gevent/eventlet or asyncio if more than one thread is involved (which is supported in both gevent and asyncio. I haven't used eventlet so I don't really know).
The PyParallel approach offers the same semantics as CPython when it comes to gevent/asyncio/eventlet. Each thread has it's own event loop and you are allowed to switch execution in the middle since you're not changing anything from other threads.

You can also report errors to Sentry using raven while handling other requests normally. Raven collects stack information which is never mutated (See https://github.com/getsentry/raven-python/blob/master/raven/utils/stacks.py#L246) and then sends it to Sentry's servers. There's no reason (that I can see at least) to block another request from being processed while collecting that information and sending the data to Sentry's servers.

The usecase described by PyParallel is also valid:

"...This is significant when you factor in how Python's scoping works at a language level: Python code executing in a parallel thread can freely access any non-local variables created by the "main thread". That is, it has the exact same scoping and variable name resolution rules as any other Python code. This facilitates loading large data structures from the main thread and then freely accessing them from parallel callbacks.

We demonstrate this with our simple Wikipedia "instant search" server, which loads a trie with 27 million entries, each one mapping a title to a 64-bit byte offset within a 60GB XML file. We then load a sorted NumPy array of all 64-bit offsets, which allows us to extract the exact byte range a given title's content appears within the XML file, allowing a client to issue a ranged request for those bytes to get the exact content via a single call to TransmitFile. This call returns immediately, but sets up the necessary structures for the kernel to send that byte range directly to the client without further interaction from us.

The working set size of the python.exe process is about 11GB when the trie and NumPy array are loaded. Thus,multiprocessing would not be feasible, as you'd have 8 separate processes of 11GB if you had 8 cores and started 8 workers, requiring 88GB just for the processes. The number of allocated objects is around 27.1 million; the datrie library can efficiently store values if they're a 32-bit integer, however, our offsets are 64-bit, so an 80-something byte PyObjectneeds to be allocated to represent each one.

This is significant because it demonstrates the competitive advantage PyParallel has against other languages when dealing with large heap sizes and object counts, whilst simultaneously avoiding the need for continual GC-motivated heap traversal, a product of memory allocation pressure (which is an inevitable side-effect of high-end network load, where incoming links are saturated at line rate)."


STM currently requires code modifications in order to avoid conflicts, at least when collections are involved. PyParallel doesn't allow these kinds of mutations so it makes the implementation much easier in PyPy.
PyParallel also requires a specific API to be used in order to utilize their parallel threads. There is a way to eliminate code modifications in PyParallel's case.
We initially run with the GIL acquired as in with any other thread and then the trace for CPyExt calls or non-thread locals mutations and if there are none we can eliminate the call to acquire the GIL. Further optimizations can be performed if only a branch of the code requires CPyExt/non-thread locals mutations.
I don't know if it's any easier than scanning the trace for lists/sets/dictionaries and replacing them with their equivalent STM implementations which Armin has already mentioned is not trivial.
In the future when STM will be production ready we can "downgrade" a thread to an STM thread when it is required instead of acquiring the GIL and blocking the execution of other threads if we want to.

STM also currently makes it harder to reason on how the program behaves. Especially when you have conflicts.
With my suggestion you can easily say if the GIL is released or not.
‫בתאריך יום ב׳, 20 ביוני 2016 ב-17:53 מאת ‪Armin Rigo‬‏ <‪arigo@tunes.org‬‏>:‬
Hi Omer,

On 20 June 2016 at 08:51, Omer Katz <omer.drow@gmail.com> wrote:
> As for implementation, if we can trace the code running in the thread and
> ensure it's not mutating global state and that CPyExt is never used during
> the thread's course we can simply release the GIL when such a thread is run.

That's a very hand-wavy and vague description.  To start with, how do
you define exactly "not mutating global state"?  We are not allowed to
write to any of the objects that existed before we started the thread?
 It may be possible to have such an implementation, yes.  Actually,
that's probably easy: tweak the STM code to crash instead of doing
something more complicated when we write to an old object.

I'm not sure how useful that would be---or how useful PyParallel is on
CPython.  Maybe if you can point us to real usages of PyParallel it
would be a start.


A bientôt,

Armin.

_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev




--
---------------------------
Michał Domański