[Python-Dev] Thread-safe file objects, the return

Antoine Pitrou solipsis at pitrou.net
Mon Mar 31 22:09:29 CEST 2008


Hello,

It seems this subject has had quite a bit of history. Tim Peters demonstrated
the problem in 2003 in this message:
http://mail.python.org/pipermail/python-dev/2003-June/036537.html

In short, Python file objects release the GIL before calling any C stdlib
function on their embedded FILE pointer. Unfortunately, if another thread
calls fclose on the FILE pointer concurrently, the contents pointed to can
become garbage and the interpreter process crashes. Just by using the same
file object in two threads running pure Python code, you can crash the
interpreter.

(another, easier-to-solve problem is that the FILE pointer stored in the
file object could become NULL at the point it is used by another thread.
If that was the only problem you could just store the FILE pointer in a
local variable before releasing the GIL et voilà)

There was some discussion at the time about the possible resolution. I've
tried to fix the problem, and I've come to what I think is a satisfying
solution, which I can sum up as the following bullet points:
 * Each file object gets a dedicated counter, which is incremented before
the bject releases the GIL and decremented after the GIL is taken again; thus
this counter keeps track of how many running "unlocked" sections of code are
using that particular file object. (please note the counter doesn't need its
own lock, since it is only modified in GIL-protected sections)
 * In the close() method, if the aforementioned counter is greater than 0,
we refuse to call fclose and instead raise an IOError.

This may seem like a worrying semantic change, but I don't think it is, for the
following reasons:
 1) if we closed the FILE pointer anyway, the interpreter would likely crash
because another thread would be using garbage data (that's what we are trying
to fix after all!)
 2) if close() raises an IOError, it can be called again later, or at worse
fclose will be called when the file object is garbage collected
 3) close() can already raise an IOError if fclose fails for whatever reason
(although for sure it's probably very rare)
 4) it doesn't seem wrong to notify the programmer that his code is very
unsafe

The patch is attached at http://bugs.python.org/issue815646 . It addresses
(or at least I hope it does) all potential problems with pure Python code,
threads, and the file object. It doesn't try to fix C extensions using the
PyFile_AsFile API and doing their own dirty things with the FILE pointer. It
could be a second step if the approach is accepted, but as noted in the 2003
discussions it would probably involve a new API. Whether we want to introduce
such an API in Python 2.x while Python 3.0 has a different IO model anyway
is left open to discussion :)

Regards

Antoine.




More information about the Python-Dev mailing list