[Python-bugs-list] [ python-Bugs-595601 ] file (& socket) I/O are not thread safe

noreply@sourceforge.net noreply@sourceforge.net
Tue, 20 Aug 2002 12:43:03 -0700


Bugs item #595601, was opened at 2002-08-15 12:34
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=595601&group_id=5470

Category: Python Interpreter Core
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jeremy Hylton (jhylton)
Assigned to: Jeremy Hylton (jhylton)
Summary: file (& socket) I/O are not thread safe

Initial Comment:
We recently found an assertion failure in the universal
newline support when running a multithreaded program
where two threads used the same Python file object. 
The assert(stream != NULL) test in 
Py_UniversalNewlineFread() fails once in a blue moon,
where stream is the stdio FILE * that the fileobject
wraps.  Further analysis suggests that there is a race
condition between checking FILE * and using FILE * that
exists in at least Python 2.1 and up.

I'll actually describe the problem as it exists in
Python 2.2, because it is simpler to avoid the
universal newline code.  That code isn't the source of
the problem, although it's assert() uncovers it in a
clear way.

In file_read() (rev 2.141.6.5), the first thing it does
is check if f_fp (the FILE *) is NULL.  If so it raises
an IOError -- operation on closed file object.  Later,
file_read() enters a for loop that calls fread() until
enough bytes have been read.

	for (;;) {
		Py_BEGIN_ALLOW_THREADS
		errno = 0;
		chunksize = fread(BUF(v) + bytesread, 1,
				  buffersize - bytesread, f->f_fp);
		Py_END_ALLOW_THREADS
		if (chunksize == 0) {
			if (!ferror(f->f_fp))
				break;
			PyErr_SetFromErrno(PyExc_IOError);
			clearerr(f->f_fp);
			Py_DECREF(v);
			return NULL;
		}

The problem is that fread() is called after the global
interpreter lock is released.  Since the lock is
released, another Python thread could run and modify
the file object, changing the value of f->f_fp.  Under
the current interpreter lock scheme, it isn't safe to
use f->f_fp without holding the interpreter lock.

The current file_read() code can fail in a variety of
ways.  It's possible for a second thread to close the
file, which will set f->f_fp to NULL.  Who knows what
fread() will do when NULL is passed.

The universal newline code is squirrels the FILE * in a
local variable, which is worse.  If it happens that
another thread closes the file, at best the local
points to a closed FILE *.  But that memory could get
recycled and then there's no way to know what it points to.

socket I/O has a similar problem with unsafe sharing of
the file descriptor.  However, this problem seems less
severe in general, because we'd just be passing a bogus
file descriptor to a system call.  We don't have to
worry about whether stdio will dump core when passed a
bogus pointer.  There is a chance the a socket will be
closed and its file descriptor used for a different
socket.  So a call to recv() with one socket ends up
using a different socket.  That will be a nightmare to
debug, but it won't cause a segfault.  (And, in
general, files and sockets shouldn't be shared between
application threads unless the application is going to
make sure its safe.)

The solution to this problem is to use a
per-file-object lock to guard access to f->f_fp.  No
thread should read or right f->f_fp without holding the
lock.  To make sure that other threads get a chance to
run when there is contention for the file, the
file-object lock should never be held when the GIL is held.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-08-20 15:43

Message:
Logged In: YES 
user_id=6380

Wow. This is harder than I thought! :-(

We should be able to solve whatever problem there is with
'print'. The PyFile_AsFile() API may have to become
obsolete, *or* we may have to add a separate call to access
the lock. I'll have to think about it more.

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-08-16 18:43

Message:
Logged In: YES 
user_id=12800

Jack, your last paragraph sounds like a good idea <waiting
for Tim to shoot holes in it>.  Isn't the whole point to
avoid core dumps, not to "make it work right" whatever that
means?  We should take the simplest approach to avoiding
crashes.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-08-16 18:21

Message:
Logged In: YES 
user_id=45365

Uhm... Isn't it good enough if you don't hold the GIL while you acquire the file lock? When I looked at it I thought it would be (but then, I didn't look very hard:-).

And there may be another way around this problem too. If you're really only worried about closing (Python at the moment couldn't care less that when two threads write to the same file at the same time the output may be intermingled or other fun things may happen) then don't lock the file, but simply set a flag "keep your hands off the FILE*, I'm using it!" and in that case delay the close. You may want to protect the flag and the delayed close with a lock, I guess.

Or you could define this as a programmer error and raise an exception if one thread tries to close a file another thread is using. (The application will know much better how to handle the contention with locking or some other mechanism).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-08-16 11:26

Message:
Logged In: YES 
user_id=31435

To cheer you up, it's probably worse than that <wink>.  
While you're holding a file lock, in theory you dare not 
execute any code that may call back into Python, because 
the called code may try to do an operation on the file 
you've got locked, and then the whole system freezes. That 
is, independent of the GIL, it's not thoroughly safe to call 
PyObject_Print with a file lock held, because an object's 
__str__ may try to "do something" with the Python-level file 
object too.  The more depressing news is that a reentrant 
file lock would solve that one.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-08-16 11:04

Message:
Logged In: YES 
user_id=31392

I think I'm less worried about PyFile_AsFile() than I am
about PyFile_WriteObject().  The latter is invoked by the
print statement.  It extracts the FILE* using
PyFile_AsFile() and proceeds to pass it to PyObject_Print().
 PyObject_Print() MUST hold the GIL, but since it has the
raw FILE* it MUST hold the file lock.  But you can't hold
both locks :-(.


----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-08-15 16:49

Message:
Logged In: YES 
user_id=45365

I'm not sure your solution of locking in the file object is feasible. Every call of PyFile_AsFile() is dangerous, if the  code proceeds to release the GIL before using the FILE* it's in danger. And this code could be in extension modules too (I know there's quite a few in my own img package).

When I did the universal newlines patch I basically ignored the problem (thinking that no-one in their right mind would close a file in one thread while it's still in use in another: how wrong can one be:-).

The only real solution may be to get rid of PyFile_AsFile altogether, and replace it with PyFile_AsFileLocked() and PyFile_Unlock() and make these calls somehow work even when the GIL is held (possibly by temporarily releasing the GIL while acquiring the file lock).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=595601&group_id=5470