another thread on Python threading

Sun Jun 3 21:10:05 EDT 2007

cgwalters at gmail.com wrote:
> I've recently been working on an application[1] which does quite a bit
> of searching through large data structures and string matching, and I
> was thinking that it would help to put some of this CPU-intensive work
> in another thread, but of course this won't work because of Python's
> GIL.

If you are doing string searching, implement the algorithm in C, and 
call out to the C (remembering to release the GIL).

> There's a lot of past discussion on this, and I want to bring it up
> again because with the work on Python 3000, I think it is worth trying
> to take a look at what can be done to address portions of the problem
> through language changes.

Not going to happen.  All Python 3000 PEPs had a due-date at least a 
month ago (possibly even 2), so you are too late to get *any* 
substantial change in.

> I remember reading (though I can't find it now) one person's attempt
> at true multithreaded programming involved adding a mutex to all
> object access.  The obvious question though is - why don't other true
> multithreaded languages like Java need to lock an object when making
> changes?

 From what I understand, the Java runtime uses fine-grained locking on 
all objects.  You just don't notice it because you don't need to write 
the acquire()/release() calls.  It is done for you.  (in a similar 
fashion to Python's GIL acquisition/release when switching threads) 
They also have a nice little decorator-like thingy (I'm not a Java guy, 
so I don't know the name exactly) called 'synchronize', which locks and 
unlocks the object when accessing it through a method.

  - Josiah

> == Why hasn't __slots__ been successful? ==
> 
> I very rarely see Python code use __slots__.  I think there are
> several reasons for this.  The first is that a lot of programs don't
> need to optimize on this level.  The second is that it's annoying to
> use, because it means you have to type your member variables *another*
> time (in addition to __init__ for example), which feels very un-
> Pythonic.
> 
> == Defining object attributes ==
> 
> In my Python code, one restriction I try to follow is to set all the
> attributes I use for an object in __init__.   You could do this as
> class member variables, but often I want to set them in __init__
> anyways from constructor arguments, so "defining" them in __init__
> means I only type them once, not twice.
> 
> One random idea is to for Python 3000, make the equivalent of
> __slots__ the default, *but* instead gather
> the set of attributes from all member variables set in __init__.  For
> example, if I write:
> 
> class Foo(object):
>   def __init__(self, bar=None):
>     self.__baz = 20
>     if bar:
>       self.__bar = bar
>     else:
>       self.__bar = time.time()
> 
> f = Foo()
> f.otherattr = 40  # this would be an error!  Can't add random
> attributes not defined in __init__
> 
> I would argue that the current Python default of supporting adding
> random attributes is almost never what you really want.  If you *do*
> want to set random attributes, you almost certainly want to be using a
> dictionary or a subclass of one, not an object.  What's nice about the
> current Python is that you don't need to redundantly type things, and
> we should preserve that while still allowing more efficient
> implementation strategies.
> 
> = Limited threading =
> 
> Now, I realize there are a ton of other things the GIL protects other
> than object dictionaries; with true threading you would have to touch
> the importer, the garbage collector, verify all the C extension
> modules, etc.  Obviously non-trivial.  What if as an initial push
> towards real threading, Python had support for "restricted threads".
> Essentially, restricted threads would be limited to a subset of the
> standard library that had been verified for thread safety, would not
> be able to import new modules, etc.
> 
> Something like this:
> 
> def datasearcher(list, queue):
>   for item in list:
>     if item.startswith('foo'):
>       queue.put(item)
>   queue.done()
> 
> vals = ['foo', 'bar']
> queue = queue.Queue()
> threading.start_restricted_thread(datasearcher, vals, queue)
> def print_item(item):
>   print item
> queue.set_callback(print_item)
> 
> Making up some API above I know, but the point here is "datasearcher"
> could pretty easily run in a true thread and touch very little of the
> interpreter; only support for atomic reference counting and a
> concurrent garbage collector would be needed.
> 
> Thoughts?
> 
> [1] http://submind.verbum.org/hotwire/wiki
>