Help me use my Dual Core CPU!

Michael Sparks sparks.m at gmail.com
Tue Sep 19 01:28:16 CEST 2006


Paul Rubin wrote:
> "Ramon Diaz-Uriarte" <rdiaz02 at gmail.com> writes:
> > You might also want to check
> > http://www.lindaspaces.com/products/NWS_overview.html
> > by the guys who "invented" Linda.
>
> Cool, I guess.
>
> > (The Oz language/Mozart system is a good example of a different and
> > very neat approach to concurrency; somewhat similar Python solutions
> > can be found at Kamaelia and Candygram. Links and other stuff at:
>
> I looked at these.  Oz/Mozart is a whole nother language, worth
> examining for its ideas, but the implementation is quite slow.
> Kamaelia doesn't attempt concurrency at all.  Its main idea is to use
> generators to simulate microthreads.

Regarding Kamaelia, that's not been the case for over a year now.

We've had threaded components as well as generator based ones since
around last July, however their API stablised properly about 4 months
back. If you use C extensions that release the GIL and are using an OS
that puts threads on different CPUs then you have genuine concurrency.
(those are albeit some big caveats, but not uncommon ones in python).

Also integrating things as a sub process is as simple instantiating a
component that talks to the subprocess over stdin/out to the
inbox/outbox model of Kamaelia and then just using it. Something
concrete this is useful for:
    mencoder_options = "-ovc lavc -oac mp3lame -ffourcc DX50 -lavcopts
acodec=mp3:vbitrate=200:abitrate=128 -vf scale=320:-2 -"
    ...#  assume 'encodingfile' is defined above
    Pipeline( DVB_TuneToChannel(channel="BBC ONE",fromDemuxer="MUX1"),
                  UnixProcess("mencoder -o "+encodingfile+"
"+mencoder_options)
            ).run()

On a dual CPU machine that code does indeed both use CPUs (as you'd
want and expect).

Also whilst we haven't had the chance to implement OS level process
based components, that doesn't mean to say we're not interested in
them, it's just that 2 people have to focus on something so we've been
focussed on building things using the system rather than fleshing out
the concurrently. To say we don't attempt implies that we don't want to
go down these routes of adding in genuine concurrency. (Which is really
why I'm replying - that's not the case - I do want to go down these
routes, and it's more man-hours than desire that are the issue).

Personally, I'm very much in the camp that says "shared data is
invariably a bad idea unless you really know what you're doing"
(largely because it's the most common source of bugs for people where
they're trying to do more than one thing at a time). People also
generally appear to find writing threadsafe code very hard. (not
everyone, just the people who aren't at the top end of the bell curve
for writing code that does more than one thing at a time)

This is why Kamaelia is message based (ie it's a concious choice in
favour), except for certain types of data (where we have a linda-esque
type system for more systemic information). The reason for this is to
help the average programmer from shooting himself in his own foot (with
a 6 CPU-barrelled shotgun :-).

In terms of how this is *implemented* however, we have zero copying of
data (except to/from threads at the moment) and so data is shared
directly, but in a location the user of the system thinks its natural
to have handoff to someone else. This approach we find tends to
encourage arbitration of access to shared resources, which IMO is a
good (defensive) approach to avoiding the problems people have with
shared resources.

But if it turns out our approach sucks for the average programmer, then
that's a bug, so we'd have to work to fix it. And if new approaches are
better, we'd welcome implementations since not all problems are screws
and not all tools are hammers :-) (as a result I'd also welcome people
saying what sucks and why, but preferably based on the system as it is
today, not as it was :)

Have fun :)


Michael.




More information about the Python-list mailing list