Why GIL? (was Re: what's the point of rpython?)

Carl Banks pavlovevidence at gmail.com
Fri Jan 23 01:09:40 EST 2009


On Jan 22, 9:38 pm, Rhamphoryncus <rha... at gmail.com> wrote:
> On Jan 22, 9:38 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
>
>
>
> > On Jan 22, 6:00 am, a... at pythoncraft.com (Aahz) wrote:
>
> > > In article <7xd4ele060.... at ruckus.brouhaha.com>,
> > > Paul Rubin  <http://phr...@NOSPAM.invalid> wrote:
>
> > > >alex23 <wuwe... at gmail.com> writes:
>
> > > >> Here's an article by Guido talking about the last attempt to remove
> > > >> the GIL and the performance issues that arose:
>
> > > >> "I'd welcome a set of patches into Py3k *only if* the performance for
> > > >> a single-threaded program (and for a multi-threaded but I/O-bound
> > > >> program) *does not decrease*."
>
> > > >The performance decrease is an artifact of CPython's rather primitive
> > > >storage management (reference counts in every object).  This is
> > > >pervasive and can't really be removed.  But a new implementation
> > > >(e.g. PyPy) can and should have a real garbage collector that doesn't
> > > >suffer from such effects.
>
> > > CPython's "primitive" storage management has a lot to do with the
> > > simplicity of interfacing CPython with external libraries.  Any solution
> > > that proposes to get rid of the GIL needs to address that.
>
> > I recently was on a long road trip, and was not driver, and with
> > nothing better to do thought quite a bit about how this.
>
> > I concluded that, aside from one major trap, it wouldn't really be
> > more difficult to inteface Python to external libraries, just
> > differently difficult.  Here is briefly what I came up with:
>
> > 1. Change the singular Python type into three metatypes:
> > immutable_type, mutable_type, and mutable_dict_type.  (In the latter
> > case, the object itself is immutable but the dict can be modified.
> > This, of course, would be the default metaclass in Python.)  Only
> > mutable_types would require a mutex when accessing.
>
> > 2. API wouldn't have to change much.  All regular API would assume
> > that objects are unlocked (if mutable) and in a consistent state.
> > It'll lock any mutable objects it needs to access.  There would also
> > be a low-level API that assumes the objects are locked (if mutable)
> > and does not require objects to be consistent.  I imagine most
> > extensions would call the standard API most of the time.
>
> > 3. If you are going to use the low-level API on a mutable object, or
> > are going to access the object structure directly, you need to acquire
> > the object's mutex. Macros such as Py_LOCK(), Py_LOCK2(), Py_UNLOCK()
> > would be provided.
>
> > 4. Objects would have to define a method, to be called by the GC, that
> > marks every object it references.  This would be a lot like the
> > current tp_visit, except it has to be defined for any object that
> > references another object, not just objects that can participate in
> > cycles.  (A conservative garbage collector wouldn't suffice for Python
> > because Python quite often allocates blocks but sets the pointer to an
> > offset within the block.  In fact, that's true of almost any Python-
> > defined type.)  Unfortunately, references on the stack would need to
> > be registered as well, so "PyObject* p;" might have to be replaced
> > with something like "Py_DECLARE_REF(PyObject,p);" which magically
> > registers it.  Ugly.
>
> > 5. Py_INCREF and Py_DECREF are gone.
>
> > 6. GIL is gone.
>
> > So, you gain the complexity of a two-level API, having to lock mutable
> > objects sometimes, and defining more visitor methods than before, but
> > you don't have to keep INCREFs and DECREFs straight, which is no small
> > thing.
>
> > The major trap is the possibily of deadlock.  To help minimize the
> > risk there would be macros to lock multiple objects at once.  Py_LOCK2
> > (a,b), which guarantess that if in another thread is calling Py_LOCK2
> > (b,a) at the same time, it won't result in a deadlock.  What's
> > disappointing is that the deadlocking possibility is always with you,
> > much like the reference counts are.
>
> IMO, locking of the object is a secondary problem.  Python-safethread
> provides one solution, but it's not the only conceivable one.  For the
> sake of discussion it's easier to assume somebody else is solving it
> for you.

That assumption might be good for the sake of the discussion *you*
want to have, but it's not for discussion I was having, which was to
address Aahz's claim that GIL makes extension writing simple by
presenting a vision of what Python might be like if it had a mark-and-
sweep collector.  The details of the GC are a small part of that and
wouldn't affect my main point even if they are quite different than I
described.  Also, extension writers would have to worry about locking
issues here, so it's not acceptable to assume somebody else will solve
that problem.


> Instead, focus on just the garbage collection.
[snip rest of threadjack]

You can ignore most of what I was talking about and focus on
technicalities of garbage collection if you want to.  I will not be
joining you in that discussion, however.


Carl Banks



More information about the Python-list mailing list