2.6, 3.0, and truly independent intepreters
Glenn Linderman
glenn at nevcal.com
Fri Oct 24 16:59:26 EDT 2008
On approximately 10/24/2008 1:09 PM, came the following characters from
the keyboard of Rhamphoryncus:
> On Oct 24, 1:02 pm, Glenn Linderman <v+pyt... at g.nevcal.com> wrote:
>
>> On approximately 10/24/2008 8:42 AM, came the following characters from
>> the keyboard of Andy O'Meara:
>>
>>
>>> Glenn, great post and points!
>>>
>> Thanks. I need to admit here that while I've got a fair bit of
>> professional programming experience, I'm quite new to Python -- I've not
>> learned its internals, nor even the full extent of its rich library. So
>> I have some questions that are partly about the goals of the
>> applications being discussed, partly about how Python is constructed,
>> and partly about how the library is constructed. I'm hoping to get a
>> better understanding of all of these; perhaps once a better
>> understanding is achieved, limitations will be understood, and maybe
>> solutions be achievable.
>>
>> Let me define some speculative Python interpreters; I think the first is
>> today's Python:
>>
>> PyA: Has a GIL. PyA threads can run within a process; but are
>> effectively serialized to the places where the GIL is obtained/released.
>> Needs the GIL because that solves lots of problems with non-reentrant
>> code (an example of non-reentrant code, is code that uses global (C
>> global, or C static) variables – note that I'm not talking about Python
>> vars declared global... they are only module global). In this model,
>> non-reentrant code could include pieces of the interpreter, and/or
>> extension modules.
>>
>> PyB: No GIL. PyB threads acquire/release a lock around each reference to
>> a global variable (like "with" feature). Requires massive recoding of
>> all code that contains global variables. Reduces performance
>> significantly by the increased cost of obtaining and releasing locks.
>>
>> PyC: No locks. Instead, recoding is done to eliminate global variables
>> (interpreter requires a state structure to be passed in). Extension
>> modules that use globals are prohibited... this eliminates large
>> portions of the library, or requires massive recoding. PyC threads do
>> not share data between threads except by explicit interfaces.
>>
>> PyD: (A hybrid of PyA & PyC). The interpreter is recoded to eliminate
>> global variables, and each interpreter instance is provided a state
>> structure. There is still a GIL, however, because globals are
>> potentially still used by some modules. Code is added to detect use of
>> global variables by a module, or some contract is written whereby a
>> module can be declared to be reentrant and global-free. PyA threads will
>> obtain the GIL as they would today. PyC threads would be available to be
>> created. PyC instances refuse to call non-reentrant modules, but also
>> need not obtain the GIL... PyC threads would have limited module support
>> initially, but over time, most modules can be migrated to be reentrant
>> and global-free, so they can be used by PyC instances. Most 3rd-party
>> libraries today are starting to care about reentrancy anyway, because of
>> the popularity of threads.
>>
>
> PyE: objects are reclassified as shareable or non-shareable, many
> types are now only allowed to be shareable. A module and its classes
> become shareable with the use of a __future__ import, and their
> shareddict uses a read-write lock for scalability. Most other
> shareable objects are immutable. Each thread is run in its own
> private monitor, and thus protected from the normal threading memory
> module nasties. Alas, this gives you all the semantics, but you still
> need scalable garbage collection.. and CPython's refcounting needs the
> GIL.
>
Hmm. So I think your PyE is an instance is an attempt to be more
explicit about what I said above in PyC: PyC threads do not share data
between threads except by explicit interfaces. I consider your
definitions of shared data types somewhat orthogonal to the types of
threads, in that both PyA and PyC threads could use these new shared
data items.
I think/hope that you meant that "many types are now only allowed to be
non-shareable"? At least, I think that should be the default; they
should be within the context of a single, independent interpreter
instance, so other interpreters don't even know they exist, much less
how to share them. If so, then I understand most of the rest of your
paragraph, and it could be a way of providing shared objects, perhaps.
I don't understand the comment that CPython's refcounting needs the
GIL... yes, it needs the GIL if multiple threads see the object, but not
for private objects... only one threads uses the private objects... so
today's refcounting should suffice... with each interpreter doing its
own refcounting and collecting its own garbage.
Shared objects would have to do refcounting in a protected way, under
some lock. One "easy" solution would be to have just two types of
objects; non-shared private objects in a thread, and global shared
objects; access to global shared objects would require grabbing the GIL,
and then accessing the object, and releasing the GIL. An interface
could allow for grabbing releasing the GIL around a block of accesses to
shared objects (with GIL:) This could reduce the number of GIL
acquires. Then the reference counting for those objects would also be
done under the GIL, and the garbage collecting? By another PyA thread,
perhaps, that grabs the GIL by default? Or a PyC one that explicitly
grabs the GIL and does a step of global garbage collection?
A more complex, more parallel solution would allow for independent
groups of shared objects. Of course, once there is more than one lock
involved, there is more potential for deadlock, but it also provides for
more parallelism. So a shared object might inherit from a "concurrency
group" which would have a lock that could be acquired (with conc_group:)
for access to those data items. Again, the reference counting would be
done under that lock for that group of objects, and garbage collecting
those objects would potentially require that lock as well...
The solution with multiple concurrency groups allows for such groups to
contain a single shared object, or many (probably related) shared
objects. So the application gets a choice of the granularity of sharing
and locking, and can choose the number of locks to optimize performance
and achieve correctness. This sort of shared data among threads,
though, suffers in the limit from all the problems described in the
Berkeley paper. More reliable programs might be achieved by using
straight PyC threads, and some very limited "data ports" that can be
combined using a higher-order flow control concept, as outlined in the
paper.
While Python might be extended with these flow control concepts, they
could be added gradually over time, and in the embedded case, could be
implemented in some other language.
--
Glenn
------------------------------------------------------------------------
. _|_|_| _|
. _| _| _|_| _|_|_| _|_|_|
. _| _|_| _| _|_|_|_| _| _| _| _|
. _| _| _| _| _| _| _| _|
. _|_|_| _| _|_|_| _| _| _| _|
------------------------------------------------------------------------
Obstacles are those frightful things you see when you take your eyes off
of the goal. --Henry Ford
More information about the Python-list
mailing list