Python-list Digest, Vol 61, Issue 368

Fri Oct 24 17:30:38 EDT 2008

> From: "Andy O'Meara" <andy55 at gmail.com>

> Unfortunately, a shared address region doesn't work when you have
> large and opaque objects (e.g. a rendered CoreVideo movie in the
> QuickTime API or 300 megs of audio data that just went through a
> DSP).  Then you've got the hit of serialization if you're got
> intricate data structures (that would normally would need to be
> serialized, such as a hashtable or something).  Also, if I may speak
> for commercial developers out there who are just looking to get the
> job done without new code, it's usually always preferable to just a
> single high level sync object (for when the job is complete) than to

Just to chime as a CPython-based ISV from the scientific visualization
realm, we face the same problem & limitations due to lack of threading
(or at least multiple independent interpreters).  A typical use case
might be a 1-3 GB dataset (molecular dynamics trajectory and derived
state) subjected to asynchronous random read/write by N threads each
running on one of N cores in parallel.

We get by jettisoning CPython almost entirely and working in C for all
tasks other than the most basic operations: thread creation, workload
scheduling, mutexes, and thread deletion.  

The biggest problem is not for the most compute-intensive tasks (where
use of C is justified), but for those relatively short-running but
algorithmically complex tasks which could be done much more easily from
Python than from C (e.g. data organization, U.I. survey/present tasks,
rarely used transformations, ad hoc scripting experiments, etc.).

Cheers,
Warren