The Future of Python Threading

Ben Sizer kylotan at gmail.com
Sat Aug 11 23:53:31 CEST 2007


On Aug 10, 5:13 pm, "Chris Mellon" <arka... at gmail.com> wrote:
> On 8/10/07, Ben Sizer <kylo... at gmail.com> wrote:
>
> > On 10 Aug, 15:38, Ben Finney <bignose+hates-s... at benfinney.id.au>
> > wrote:
> > > Last I checked, multiple processes can run concurrently on multi-core
> > > systems. That's a well-established way of structuring a program.
>
> > It is, however, almost always more complex and slower-performing.
>
> > Plus, it's underdocumented. Most academic study of concurrent
> > programming, while referring to the separately executing units as
> > 'processes', almost always assume a shared memory space and the
> > associated primitives that go along with that.
>
> This is simply not true. Firstly, there's a well defined difference
> between  'process' and a 'thread' and that is that processes have
> private memory spaces. Nobody says "process" when they mean threads of
> execution within a shared memory space and if they do they're wrong.

I'm afraid that a lot of what students will be taught does exactly
this, because the typical study of concurrency is in relation to
contention for shared resources, whether that be memory, a file, a
peripheral, a queue, etc. One example I have close to hand is
'Principles of Concurrent and Distributed Programming', which has no
mention of the term 'thread'. It does have many examples of several
processes accessing shared objects, which is typically the focus of
most concurrent programming considerations.

The idea that processes have memory space completely isolated from
other processes is both relatively recent and not universal across all
platforms. It also requires you to start treating memory as
arbitrarily different from other resources which are typically
shared.

> And no, "most" academic study isn't limited to shared memory spaces.
> In fact, almost every improvement in concurrency has been moving
> *away* from simple shared memory - the closest thing to it is
> transactional memory, which is like shared memory but with
> transactional semantics instead of simple sharing.

I think I wasn't sufficiently clear; research may well be moving in
that direction, but you can bet that the typical student with their
computer science or software engineering degree will have been taught
far more about how to use synchronisation primitives within a program
than how to communicate between arbitrary processes.

> There's nothing "undocumented" about IPC. It's been around as a
> technique for decades. Message passing is as old as the hills.

I didn't say undocumented, I said underdocumented. The typical
programmer these days comes educated in at least how to use a mutex or
semaphore, and will probably look for that capability in any language
they use. They won't be thinking about creating an arbitrary message
passing system and separating their project out into separate
programs, even if that has been what UNIX programmers have chosen to
do since 1969. There are a multitude of different ways to fit IPC into
a system, but only a few approaches to threading, which also happen to
coincide quite closely to how low-level OS functionality handles
processes meaning you tend to get taught the latter. That's why it's
useful for Python to have good support for it.

> > Hardly. Sure, so you don't have to worry about contention over objects
> > in memory, but it's still completely asynchronous, and there will
> > still be a large degree of waiting for the other processes to respond,
> > and you have to develop the protocols to communicate. Apart from
> > convenient serialisation, Python doesn't exactly make IPC easy, unlike
> > Java's RMI for example.
>
> There's nothing that Python does to make IPC hard, either. There's
> nothing in the standard library yet, but you may be interested in Pyro
> (http://pyro.sf.net) or Parallel Python
> (http://www.parallelpython.com/). It's not erlang, but it's not hard
> either. At least, it's not any harder than using threads and locks.

Although Pyro is good in what it does, simple RPC alone doesn't solve
most of the problems that typical threading usage does. IPC is useful
for the idea of submitting jobs in the background but it doesn't fit
so well to situations where there are parallel loops both acting on a
shared resource. Say you have a main thread and a network reading
thread - given a shared queue for the data, you can safely do this by
adding just 5 lines of code: 2 locks, 2 unlocks, and a call to start
the networking thread. Implementing that using RPC will be more
complex, or less efficient, or probably both.

--
Ben Sizer




More information about the Python-list mailing list