[pypy-dev] How will STM actually integrate with normal Python code

Tue Jan 31 17:58:53 CET 2012

Hi Timothy,

On Tue, Jan 31, 2012 at 16:26, Timothy Baldridge <tbaldridge at gmail.com> wrote:
> def foo(d):
>    if "foo" in d:
>        del d["foo"]
>
> Will never cause a segmentation fault (due to multiple threads
> accessing "d" at the same time), but it may throw a KeyError. That is,
> all Python code will continue to be "thread-safe" as it is in CPython,
> but race conditions will continue to exist and must be handled in
> standard ways (locks, CAS, etc.).

No, precisely not.  Such code will continue to work as it is, without
race conditions or any of the messy multithread-induced headaches.
Locks, CAS, etc. are all superflous.

So what would work and not work?  In one word: all "transactions" work
exactly as if run serially, one after another.  A transaction is just
one unit of work; a callback.  We use this working code for comparison
on top of CPython or a non-STM PyPy:
https://bitbucket.org/pypy/pypy/raw/stm/lib_pypy/transaction.py .  You
add transactions with the add() function, and execute them all with
the run() function (which typically contains further add()s).  The
only source of non-determinism is in run() taking a random transaction
as the next one.  Of course this demo code runs the transactions
serially, but the point is that even "pypy-stm" gives you the illusion
of running them serially.

So you stat by writing code that is *safe*, and then you have to think
a bit in order to increase the parallelism, instead of the other way
around when using traditional multithreading in non-Python languages.

There are rules that are a bit subtle (but not too much) about when
transactions can parallelize or not.  Basically, as soon as a
transaction does I/O, all the other transactions will be stalled; and
if transactions very often change the same objects, then you will get
a lot of conflicts and restarts.

> "In PyPy, we look at STM like we would look at the GC.  It may be
> replaced in a week by a different one, but for the "end user" writing
> pure Python code, it essentially doesn't make a difference.  "

I meant to say that STM, in our case, is just (hopefully) an
optimization that lets some programs run on multiple CPUs --- the ones
that are based on the 'transaction' module.  But it's still just an
optimization in the sense that the programs run exactly as if using
the transaction.py I described above.

In yet other words: notice that transaction.py doesn't even use the
'thread' module.  So if we get the same behavior with pypy-stm's
built-in 'transaction' module, it means that the example you described
is perfectly safe as it is.

(Update: today we have a "pypy-stm" that works exactly like
transaction.py and exhibits multiple-CPU usage.  It's just terribly
slow and doesn't free any memory ever :-)  But it runs
http://paste.pocoo.org/show/543646/ , which is a simple epoll-based
server creating new transactions in order to do the CPU-intensive
portions of answering the requests.  In the code there is no trace of
CAS, locks, 'multiprocessing', etc.)

A bientôt,

Armin.