[pypy-dev] How will STM actually integrate with normal Python code
arigo at tunes.org
Tue Jan 31 17:58:53 CET 2012
On Tue, Jan 31, 2012 at 16:26, Timothy Baldridge <tbaldridge at gmail.com> wrote:
> def foo(d):
> if "foo" in d:
> del d["foo"]
> Will never cause a segmentation fault (due to multiple threads
> accessing "d" at the same time), but it may throw a KeyError. That is,
> all Python code will continue to be "thread-safe" as it is in CPython,
> but race conditions will continue to exist and must be handled in
> standard ways (locks, CAS, etc.).
No, precisely not. Such code will continue to work as it is, without
race conditions or any of the messy multithread-induced headaches.
Locks, CAS, etc. are all superflous.
So what would work and not work? In one word: all "transactions" work
exactly as if run serially, one after another. A transaction is just
one unit of work; a callback. We use this working code for comparison
on top of CPython or a non-STM PyPy:
https://bitbucket.org/pypy/pypy/raw/stm/lib_pypy/transaction.py . You
add transactions with the add() function, and execute them all with
the run() function (which typically contains further add()s). The
only source of non-determinism is in run() taking a random transaction
as the next one. Of course this demo code runs the transactions
serially, but the point is that even "pypy-stm" gives you the illusion
of running them serially.
So you stat by writing code that is *safe*, and then you have to think
a bit in order to increase the parallelism, instead of the other way
around when using traditional multithreading in non-Python languages.
There are rules that are a bit subtle (but not too much) about when
transactions can parallelize or not. Basically, as soon as a
transaction does I/O, all the other transactions will be stalled; and
if transactions very often change the same objects, then you will get
a lot of conflicts and restarts.
> "In PyPy, we look at STM like we would look at the GC. It may be
> replaced in a week by a different one, but for the "end user" writing
> pure Python code, it essentially doesn't make a difference. "
I meant to say that STM, in our case, is just (hopefully) an
optimization that lets some programs run on multiple CPUs --- the ones
that are based on the 'transaction' module. But it's still just an
optimization in the sense that the programs run exactly as if using
the transaction.py I described above.
In yet other words: notice that transaction.py doesn't even use the
'thread' module. So if we get the same behavior with pypy-stm's
built-in 'transaction' module, it means that the example you described
is perfectly safe as it is.
(Update: today we have a "pypy-stm" that works exactly like
transaction.py and exhibits multiple-CPU usage. It's just terribly
slow and doesn't free any memory ever :-) But it runs
http://paste.pocoo.org/show/543646/ , which is a simple epoll-based
server creating new transactions in order to do the CPU-intensive
portions of answering the requests. In the code there is no trace of
CAS, locks, 'multiprocessing', etc.)
More information about the pypy-dev