[Python-ideas] A concurrency survey of sorts

Thu Nov 3 18:26:13 CET 2011

On Wed, Nov 2, 2011 at 3:36 PM, Mike Meyer <mwm at mired.org> wrote:
> 1) How much of the Python standard library is known to be thread safe?
>
> 2) How many packages in PyPI are known to be thread safe?

"Thread safe" isn't nearly as well-defined as many people act, and
certainly doesn't mean it's safe to use something with threads. When
people try to use the very, very, very few things that are thread safe
without their own synchronization, they almost always end up with
buggy code.

It's also worth noting that many of the most important
concurrency-supporting packages in PyPI don't use multithreading at
all.

> 3) Can you suggest another approach to getting safe high-performance
> shared data in concurrent operation? I've already considered:
>
>  a) I proposed making actions that mutate data require locked objects,
> because I've seen that work in other languages. I recognize that
> doesn't mean it will work in Python, but it's more than I can say
> about the alternatives I knew about then.,

I don't see how this is feasible or makes Python a better language.
This would add complication that doesn't benefit lots of people, would
slow down normal cases, and wouldn't solve the datasharing problem for
important cases that aren't just sharing memory between threads.

>  b) Bertrand Meyer's SCOOPS system, designed for Eiffel. It has two
> major strikes against it: 1) it is based on type attributes on
> *variables*, andI could figure out how to translate that to a language
> where variables aren't typed. 2) I don't know that there's a working
> implementation.

I don't mean to be rude, but I don't understand how this is an idea at all.

We already have lot of tools for sharing data predictably among
threads, concurrent tasks, processes, and machines: Queue.Queue,
thread locks, callbacks, MPI, message queues, and databases to name a
few. Each of these has disadvantages and most of these have
advantages.

> 4) Can you suggest a minor change that would move things toward safer
> concurrent code with high-performance shared data? I can see two
> possibilities:
>
>  a) Audit any parts of the standard library that aren't already known
> to be thread safe, and flag those that aren't.  Fixing them may want
> to wait on a better mechanism than posix locks.

I am not convinced adding this at a language level is net good at all.
Flagging things as "thread unsafe" is silly, as practically everything
is thread unsafe. Flagging things as "thread safe" is seldom useful,
because you should still be handling synchronization in your code.
Creating locks on everything in the stdlib would make Python bigger,
more complex, and slower and still not solve concurrency problems for
users – indeed, it could make them less apparent.

And none of this goes to address concurrency that isn't based on
multithreading, which is important and in many, many applications
preferable.

>  b) Add a high-level, high-performance shared object facility to the
> multiprocess package.

The multiprocessing module already provides means to pass data which
are fairly implicit. Trying to encapsulate the shared state as a
Python object would be even more troublesome.

Mike