About threads in python

sturlamolden sturlamolden at yahoo.no
Fri Apr 22 15:19:56 EDT 2011


On Apr 21, 3:19 pm, dutche <dut... at gmail.com> wrote:

> My question is about the efficiency of threads in python, does anybody
> has something to share?

Never mind all the FUD about the GIL. Most of it is ill-informed
and plain wrong.

The GIL prevents you from doing one thing, which is parallel
compute-bound code in plain Python. But that is close to
idiotic use of Python threads anyway. I would seriously
question the competance of anyone attempting this, regardless
of the GIL. It is optimising computational code in the wrong
end.

To optimise computational code, notice that Python itself
gives you a 200x performance penalty. That is much more
important than not using all 4 cores on a quadcore processor.
In this case, start by identifying bottlenecks using the
profiler. Then apply C libraries or these or rewrite to Cython.
If that is not sufficient, you can start to think about using
more hardware (e.g. multithreading in C or Cython). This advice
only applies to computational code though. Most usecases for
Python will be i/o bound (file i/o, GUI, database, webserver,
internet client), for which the GIL is not an issue at all.

Python threads will almost always do what you expect. Try
your code first -- if they don't scale, then ask this question.
Usually the problem will be in your own code, and have nothing
to do with the GIL. This is almost certainly the case if an
i/o bound server do not scale, as i/o bound Python code are
(almost) never affected by the GIL.

In cases where a multi-threaded i/o solution do not scale,
you likely want to use asynchronous design instead, as the problem
can be the use of threads per se. See if Twisted fits your need.
Scalability problems for i/o bound server might also be external to
Python. For example it could be a database server and not the use
of Python threads. For example switching from Sqlite to
Microsoft SQL server will have impact on the way your program
behaves under concurrent load: Sqlite is is faster per query,
but has a global lock. If you want concurrent access to a
database, a global lock in the database is a much more important
issue than a global lock in the Python interpreter. If you want
a fast response, the sluggishness of the database can be more
important than the synchronization of the Python code.

In the rare event that the GIL is an issue, there are still
things you can do:

You can always use processes instead of threads (i.e.
multiprocessing.Process instead of threading.Thread). Since
the API is similar, writing threaded code is never a waste of
effort.

There are also Python implementations that don't have a GIL
(PyPy, Jython, IronPython). You can just swap interpreter
and see if it scales better.

Testing with another interpreter or multiprocessing is a good
litmus test to see if the problem is in your own code.

Cython and Pyrex are compilers for a special CPython extension
module language. They give you full control over the GIL, as well
as the speed of C when you need it. They can make Python threads
perform as good as threads in C for computational code -- I have
e.g. compared with OpenMP to confirm for myself, and there is
really no difference.

You thus have multiple options if the GIL gets in your way,
without major rewrite, including:

- Interpreter without a GIL (PyPy, Jython, IronPython)
- multiprocessing.Process instead of threading.Thread
- Cython or Pyrex

Things that require a little bit more effort include:

- Use a multi-threaded C library for your task.
- ctypes.CDLL
- OpenMP in C/C++, call with ctypes or Cython
- Outproc COM+ server + pywin32
- Fortran + f2py
- rewrite to use os.fork (except Windows)


IMHO:

The biggest problems with the GIL is not the GIL, but
bullshit FUD and C libraries that don't release the GIL
as often as they should. NumPy ans SciPy is notorius cases
of the latter, and there are similar cases in the standard
library as well.

If I should give an advice it would be to just try Python threads
and see for your self. Usually they will do what you expect.
You only have a problem if they do not. In that case there
are plenty of things that can be done, most of them with very
little effort.


Sturla



More information about the Python-list mailing list