2.6, 3.0, and truly independent intepreters

Sat Oct 25 07:35:58 EDT 2008

Hi Andy,

Andy wrote:

> However, we require true thread/interpreter
> independence so python 2 has been frustrating at time, to say the
> least.  Please don't start with "but really, python supports multiple
> interpreters" because I've been there many many times with people.
> And, yes, I'm aware of the multiprocessing module added in 2.6, but
> that stuff isn't lightweight and isn't suitable at all for many
> environments (including ours).

This is a very conflicting set of statements and whilst you appear to be
extremely clear on what you want here, and why multiprocessing, and
associated techniques are not appropriate, this does sound very
conflicting. I'm guessing I'm not the only person who finds this a
little odd.

Based on the size of the thread, having read it all, I'm guessing also
that you're not going to have an immediate solution but a work around.
However, also based on reading it, I think it's a usecase that would be
generally useful in embedding python.

So, I'll give it a stab as to what I think you're after.

The scenario as I understand it is this:
    * You have an application written in C,C++ or similar.
    * You've been providing users the ability to script it or customise it
      in some fashion using scripts.

Based on the conversation:
    * This worked well, and you really liked the results, but...
    * You only had one interpreter embedded in the system
    * You were allowing users to use multiple scripts

Suddenly you go from: Single script, single memory space.
To multiple scripts, unconstrained shared shared memory space.

That then causes pain for you and your users. So as a result, you decided to
look for this scenario:
    * A mechanism that allows each script to think it's the only script
      running on the python interpreter.
    * But to still have only one embedded instance of the interpreter.
    * With the primary motivation to eliminate the unconstrained shared
      memory causing breakage to your software.

So, whilst the multiprocessing module gives you this:
    * With the primary motivation to eliminate the unconstrained shared
      memory causing breakage to your software.

It's (for whatever reason) too heavyweight for you, due to the multiprocess
usage. At a guess the reason for this is because you allow the user to run
lots of these little scripts.

Essentially what this means is that you want "green processes".

One workaround of achieving that may be to find a way to force threads in
python to ONLY be allowed access to (and only update) thread local values,
rather than default to shared values.

The reason I say that, is because the closest you get to green processes in
python at the moment is /inside/ a python generator. It's nowhere near the
level you want, but it's what made me think of the idea of green processes.

Specifically if you have the canonical example of a python generator:

def fib():
    a,b = 1,1
    while 1:
        a,b = b, a+b
        yield 1

Then no matter how many times I run that, the values are local, and can't
impact each other. Now clearly this isn't what you want, but on some level
it's *similar*.

You want to be able to do:
    run(this_script)

and then when (this_script) is running only use a local environment.

Now, if you could change the threading API, such that there was a means of
forcing all value lookups to look in thread local store before looking
outside the thread local store [1], then this would give you a much greater
level of safety.

[1] I don't know if there is or isn't I've not been sufficiently interested
    to look...

I suspect that this would also be a very nice easy win for many
multi-threaded applications as well, reducing accidental data sharing.

Indeed, reversing things such that rather than doing this:
   myLocal = threading.local()
   myLocal.X = 5

Allowing a thread to force the default to be the other way round:
   systemGlobals = threading.globals()
   systemGlobals = 5

Would make a big difference. Furthermore, it would also mean that the
following:
   import MyModule
   from MyOtherModule import whizzy thing

I don't know if such a change would be sufficient to stop the python
interpreter going bang for extension modules though :-)

I suspect also that this change, whilst potentially fraught with
difficulties, would be incredibly useful in python implementations
that are GIL-free (such as Jython or IronPython)

Now, this for me is entirely theoretical because I don't know much about
python's threading implementation (because I've never needed to), but it
does seem to me to be the easier win than looking for truly independent
interpreters...

It would also be more generally useful, since it would make accidental
sharing of data (which is where threads really hurt people most) much
harder.

Since it was raised in the thread, I'd like to say "use Kamaelia", but your
usecase is slightly different as I understand it. You want to take existing
stuff that won't be written in any particular way, to encourage it to be
safely reusable in a shared environment. We do do that to an extent, but I'm
guessing not quite as unconstrained as you. (We specifically require usage
of things in a lightly constrained manner)

I suspect though that this hypothetical ability to switch a thread to search
thread locals (or only have thread locals) first would itself be incredibly
useful as time goes on.

Kamaelia implements the kind of model that this paper referenced in the
thread advocates:
   http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf 

As you'll see from this recent Pycon UK presentation:
   http://tinyurl.com/KamaeliaPyconUK

It goes a stage further though by actively providing metaphors based around
components built using inboxes/outboxes designed *specifically* to encourage
safe concurrency. (heritage wise, kamaelia owes more to occam & CSP than
anything else)

After all we've found times when concurrency using generators is good
which is most of the time - it's probably the most fundamental unit of
concurrency you can get, followed by true coroutines (greenlets). Next
up is threads (you can put generators into threads, but not vice versa).
Next up is processes (you can put threads in processes, but not vice
versa).

Finishing on a random note:

The interesting thing from my perspective is you essentially want something
half way between threads and processes, which I called green processes for
want of a decent phrase. Now that's akin to sandboxing, but I suspect leaky
sandboxing might be sufficient for you. (ie a sandbox where you have to try
hard to break out the box as oppose to it being trivial) I'd be pretty
certain that something like green processes, or "thread local only" would
be useful in the future.

After all, that along with decent sandboxing would be the sort of thing
necessary to allow python to be embedded in a browser. (If flash used
multiple processes, it'd kill most people's systems after all, and if they
don't have something like green processes, flash would make web pages even
worse...)

Indeed, thread local only and globals accessed via STM [1] would be
incredibly handy. (I say that because generator globals and globals accessed
via a CAT (which is kamaelia specific thing, but similar conceptually),
works extremely well)

[1] even something as lightweight as http://www.kamaelia.org/STM

If a "search thread local" approach or "thread local only" approach
sounds reasonable, then it may be a "leaky sandbox" approach is perhaps
worth investigating. After all, a leaky sandbox may be doable.

Tuppence-worthy-ly-yours,.

Michael.
--
http://www.kamaelia.org/GetKamaelia