How will STM actually integrate with normal Python code

As Armin stated in a recent mailing list thread: "In PyPy, we look at STM like we would look at the GC. It may be replaced in a week by a different one, but for the "end user" writing pure Python code, it essentially doesn't make a difference. " So, my question is, how exactly will STM integrate into PyPy? I'm going to take a guess here, and perhaps someone can elaborate to correct me.
From what I'm reading, PyPy with STM will offer the same promises (or lack of promises) that the JVM and CLR offer their code:
For example, this code: def foo(d): if "foo" in d: del d["foo"] Will never cause a segmentation fault (due to multiple threads accessing "d" at the same time), but it may throw a KeyError. That is, all Python code will continue to be "thread-safe" as it is in CPython, but race conditions will continue to exist and must be handled in standard ways (locks, CAS, etc.). Am I right in this description? Thanks, Timothy -- “One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.” (Robert Firth)

Hi Timothy, On Tue, Jan 31, 2012 at 16:26, Timothy Baldridge <tbaldridge@gmail.com> wrote:
No, precisely not. Such code will continue to work as it is, without race conditions or any of the messy multithread-induced headaches. Locks, CAS, etc. are all superflous. So what would work and not work? In one word: all "transactions" work exactly as if run serially, one after another. A transaction is just one unit of work; a callback. We use this working code for comparison on top of CPython or a non-STM PyPy: https://bitbucket.org/pypy/pypy/raw/stm/lib_pypy/transaction.py . You add transactions with the add() function, and execute them all with the run() function (which typically contains further add()s). The only source of non-determinism is in run() taking a random transaction as the next one. Of course this demo code runs the transactions serially, but the point is that even "pypy-stm" gives you the illusion of running them serially. So you stat by writing code that is *safe*, and then you have to think a bit in order to increase the parallelism, instead of the other way around when using traditional multithreading in non-Python languages. There are rules that are a bit subtle (but not too much) about when transactions can parallelize or not. Basically, as soon as a transaction does I/O, all the other transactions will be stalled; and if transactions very often change the same objects, then you will get a lot of conflicts and restarts.
I meant to say that STM, in our case, is just (hopefully) an optimization that lets some programs run on multiple CPUs --- the ones that are based on the 'transaction' module. But it's still just an optimization in the sense that the programs run exactly as if using the transaction.py I described above. In yet other words: notice that transaction.py doesn't even use the 'thread' module. So if we get the same behavior with pypy-stm's built-in 'transaction' module, it means that the example you described is perfectly safe as it is. (Update: today we have a "pypy-stm" that works exactly like transaction.py and exhibits multiple-CPU usage. It's just terribly slow and doesn't free any memory ever :-) But it runs http://paste.pocoo.org/show/543646/ , which is a simple epoll-based server creating new transactions in order to do the CPU-intensive portions of answering the requests. In the code there is no trace of CAS, locks, 'multiprocessing', etc.) A bientôt, Armin.

Hi Timothy, On Tue, Jan 31, 2012 at 16:26, Timothy Baldridge <tbaldridge@gmail.com> wrote:
No, precisely not. Such code will continue to work as it is, without race conditions or any of the messy multithread-induced headaches. Locks, CAS, etc. are all superflous. So what would work and not work? In one word: all "transactions" work exactly as if run serially, one after another. A transaction is just one unit of work; a callback. We use this working code for comparison on top of CPython or a non-STM PyPy: https://bitbucket.org/pypy/pypy/raw/stm/lib_pypy/transaction.py . You add transactions with the add() function, and execute them all with the run() function (which typically contains further add()s). The only source of non-determinism is in run() taking a random transaction as the next one. Of course this demo code runs the transactions serially, but the point is that even "pypy-stm" gives you the illusion of running them serially. So you stat by writing code that is *safe*, and then you have to think a bit in order to increase the parallelism, instead of the other way around when using traditional multithreading in non-Python languages. There are rules that are a bit subtle (but not too much) about when transactions can parallelize or not. Basically, as soon as a transaction does I/O, all the other transactions will be stalled; and if transactions very often change the same objects, then you will get a lot of conflicts and restarts.
I meant to say that STM, in our case, is just (hopefully) an optimization that lets some programs run on multiple CPUs --- the ones that are based on the 'transaction' module. But it's still just an optimization in the sense that the programs run exactly as if using the transaction.py I described above. In yet other words: notice that transaction.py doesn't even use the 'thread' module. So if we get the same behavior with pypy-stm's built-in 'transaction' module, it means that the example you described is perfectly safe as it is. (Update: today we have a "pypy-stm" that works exactly like transaction.py and exhibits multiple-CPU usage. It's just terribly slow and doesn't free any memory ever :-) But it runs http://paste.pocoo.org/show/543646/ , which is a simple epoll-based server creating new transactions in order to do the CPU-intensive portions of answering the requests. In the code there is no trace of CAS, locks, 'multiprocessing', etc.) A bientôt, Armin.
participants (2)
-
Armin Rigo
-
Timothy Baldridge