Thats true. IPC through sockets or (somewhat faster) shared memory -  cPickle at least - is usually the maximum of such approaches.

For tasks really requiring threading one can consider IronPython.
Most advanced technique I've see for CPython ist posh : 

I'd say Py3K should just do the locking job for dicts / collections, obmalloc and refcount (or drop the refcount mechanism) and do the other minor things in order to enable free threading. Or at least enable careful sharing of Py-Objects between multiple separated Interpreter instances of one process.
.NET and Java have shown that the speed costs for this technique are no so extreme. I guess less than 10%. 
And Python is a VHLL with less focus on speed anyway.
Also see discussions in .


