Signal-resistant code (was: Two random and nearly unrelated ideas)
On Wed, Sep 04, 2002 at 11:04:27AM -0700, Brett Cannon wrote:
[Oren Tirosh]
<snip>
Not before all all Python I/O calls are converted to be EINTR-safe.
what is EINTER-safe?
When an I/O operation is interrupted by an unmasked signal it returns with errno==EINTR. The state of the file is not affected and repeating the operation should recover and continue with no loss of data. Here is an EINTR-safe version of read: ssize_t safe_read(int fd, void *buf, size_t count) { ssize_t result; do { result = read(fd, buf, count); } while (result == -1 && errno == EINTR); return result; } When exposing the C I/O calls to Python you can either: 1. Use EINTR-safe I/O and hide this from the user. 2. Pass on EINTR to the user. Python currently does #2 with a big caveat - the internal buffering of functions like file.read or file.readline is messed up and cannot be cleanly restarted. This makes signals unusable for delivery of asynchronous events in the background without affecting the state of the main program. Oren
what is EINTER-safe?
When an I/O operation is interrupted by an unmasked signal it returns with errno==EINTR. The state of the file is not affected and repeating the operation should recover and continue with no loss of data.
What if the operation is a select() call? Is restarting the right thing? How to take into account the consumed portion of the timeout, if given?
Here is an EINTR-safe version of read:
ssize_t safe_read(int fd, void *buf, size_t count) { ssize_t result; do { result = read(fd, buf, count); } while (result == -1 && errno == EINTR); return result; }
When exposing the C I/O calls to Python you can either:
1. Use EINTR-safe I/O and hide this from the user. 2. Pass on EINTR to the user.
Python currently does #2 with a big caveat - the internal buffering of functions like file.read or file.readline is messed up and cannot be cleanly restarted. This makes signals unusable for delivery of asynchronous events in the background without affecting the state of the main program.
Can you point to a place in the code where this is happening? Or is this a stdio problem? I believe that calls like fgets() and getchar() don't lose data, but maybe I misunderstand your observation. As I said before, I'm very skeptical that making the I/O ops EINTR-safe would be enough to allow the use of signals as siggested by Skip, but that might still be useful for other purposes, *if* we can decide when to honor EINTR and when not. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, Sep 04, 2002 at 03:16:25PM -0400, Guido van Rossum wrote:
When an I/O operation is interrupted by an unmasked signal it returns with errno==EINTR. The state of the file is not affected and repeating the operation should recover and continue with no loss of data.
What if the operation is a select() call? Is restarting the right thing? How to take into account the consumed portion of the timeout, if given?
Some versions of select update the timeout structure to the remainder if they are interrupted by a signal. It's probably not a good idea to rely on this so gettimeofday could be used to calculate the remainder.
Or is this a stdio problem? I believe that calls like fgets() and getchar() don't lose data, but maybe I misunderstand your observation.
This is not the point - even if Python I/O calls were fully restartable would you actually expect people to check for EINTR and restart for *every* I/O operation in the program just in case some module happens to use signals? Instead of for line in file: do_something_with(line) we would need to write while 1: try: line = file.next() except IOError, exc: if exc.errno == errno.EINTR: continue else: raise except StopIteration: break do_something_with(line)
As I said before, I'm very skeptical that making the I/O ops EINTR-safe would be enough to allow the use of signals as suggested by Skip
If it's good enough for other purposes it should be good enough for Skip's proposal, too.
Skip, but that might still be useful for other purposes, *if* we can decide when to honor EINTR and when not.
Only low-level functions like os.read and os.write that map directly to stdio functions should ever return EINTR. To make Python signal-safe all other calls that can return EINTR should have a retry loop. On EINTR they should check if there are things to do and if so grab the GIL, make pending calls, release the GIL and retry the operation (unless an exception has been raised by the signal handler, of course). This way I could finally write a Python daemon that reloads its configuration files on getting the customary SIGHUP :-) Oren
What if the operation is a select() call? Is restarting the right thing? How to take into account the consumed portion of the timeout, if given?
Some versions of select update the timeout structure to the remainder if they are interrupted by a signal. It's probably not a good idea to rely on this so gettimeofday could be used to calculate the remainder.
I like Neil's suggestion: simply return. The timeout is a hint.
Or is this a stdio problem? I believe that calls like fgets() and getchar() don't lose data, but maybe I misunderstand your observation.
This is not the point - even if Python I/O calls were fully restartable would you actually expect people to check for EINTR and restart for *every* I/O operation in the program just in case some module happens to use signals?
Instead of
for line in file: do_something_with(line)
we would need to write
while 1: try: line = file.next() except IOError, exc: if exc.errno == errno.EINTR: continue else: raise except StopIteration: break do_something_with(line)
OK, but you're changing your tune here. I agree that this is bad, but I still don't believe (or understand) your previous remark about readline losing track of buffering. But let's forget about this, I trust that you really meant what you showed here.
As I said before, I'm very skeptical that making the I/O ops EINTR-safe would be enough to allow the use of signals as suggested by Skip
If it's good enough for other purposes it should be good enough for Skip's proposal, too.
Well, it has to be *perfect* for Skip's proposal, since it means we'd be generating signals probably at a rate of 100 per second.
Skip, but that might still be useful for other purposes, *if* we can decide when to honor EINTR and when not.
Only low-level functions like os.read and os.write that map directly to stdio functions should ever return EINTR.
Um, os.read/write are the ones that *don't* map to stdio. Maybe you meant "that map directly to file descriptors"? But I doubt this would be acceptable -- if we were generating 100 signals per second, os.read/write become much harder to use if they could raise EINTR (currently they only raise EINTR if the app uses signal handlers, which isn't that common).
To make Python signal-safe all other calls that can return EINTR should have a retry loop. On EINTR they should check if there are things to do and if so grab the GIL, make pending calls, release the GIL and retry the operation (unless an exception has been raised by the signal handler, of course).
This way I could finally write a Python daemon that reloads its configuration files on getting the customary SIGHUP :-)
If you really want that, maybe you could see if you can produce a working design and patch? Even if it's not perfect enough to use signals to replace the ticker, people who like to use signals would probably be happy. --Guido van Rossum (home page: http://www.python.org/~guido/)
On woensdag, sep 4, 2002, at 11:51 US/Pacific, Oren Tirosh wrote:
When an I/O operation is interrupted by an unmasked signal it returns with errno==EINTR. The state of the file is not affected and repeating the operation should recover and continue with no loss of data.
I'm not sure about modern unixen (it's been a long time since I was interested in such lowlevel details) but historically this has been one complete mess. Aside from some unix variations that basically didn't do restart at all there have always been problems with signal restart semantics. For sockets and various devices (raw ttys, I think) you could definitely lose data. Hmm, and when I think of it I don't think it's even possible to restart safely. What if I do a read() on a socket, and I request more bytes than the available physical memory (but less than VM, of course)? The kernel simply doesn't have anywhere to store the bytes other than my buffer, and if it has to return EINTR then >POOF< these bytes are gone forever.
[Jack]
Hmm, and when I think of it I don't think it's even possible to restart safely. What if I do a read() on a socket, and I request more bytes than the available physical memory (but less than VM, of course)? The kernel simply doesn't have anywhere to store the bytes other than my buffer, and if it has to return EINTR then >POOF< these bytes are gone forever.
I think that if any bytes have already been copied into your buffer, you don't get an EINTR, you get a short read. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido van Rossum]
[Jack]
Hmm, and when I think of it I don't think it's even possible to restart safely. What if I do a read() on a socket, and I request more bytes than the available physical memory (but less than VM, of course)? The kernel simply doesn't have anywhere to store the bytes other than my buffer, and if it has to return EINTR then >POOF< these bytes are gone forever.
I think that if any bytes have already been copied into your buffer, you don't get an EINTR, you get a short read.
I'm not fully familiar with all the details of this problem, it surely has been in the air for quite a long time now (I might have first heard of it while Taylor UUCP was being developed). It might be dependent on the underlying system. If I'm not mistaken, this is Ian Taylor who introduced the following Autoconf macro: - Macro: AC_SYS_RESTARTABLE_SYSCALLS If the system automatically restarts a system call that is interrupted by a signal, define `HAVE_RESTARTABLE_SYSCALLS'. In GNU file utilities (now merged within the new GNU coreutils), Jim Meyering uses restart wrappers for many I/O functions, so the idea of wrappers has been maturing for a while, and is used in basic, heavily used programs. However, I did not look at such wrappers recently. Python might probably wrap calls when these are restartable, or transmit the error upwards for systems where calls are not restartable. -- François Pinard http://www.iro.umontreal.ca/~pinard
pinard@iro.umontreal.ca:
- Macro: AC_SYS_RESTARTABLE_SYSCALLS If the system automatically restarts a system call that is interrupted by a signal, define `HAVE_RESTARTABLE_SYSCALLS'.
Python might probably wrap calls when these are restartable, or transmit the error upwards for systems where calls are not restartable.
I think that macro means that you *don't* have to use a wrapper to restart syscalls, because it happens automatically. So if it's not defined it means you have to restart them manually, not that they can't be restarted at all. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
On Wed, Sep 04, 2002 at 05:21:44PM -0400, François Pinard wrote:
I'm not fully familiar with all the details of this problem, it surely has been in the air for quite a long time now (I might have first heard of it while Taylor UUCP was being developed). It might be dependent on the underlying system. If I'm not mistaken, this is Ian Taylor who introduced the following Autoconf macro:
- Macro: AC_SYS_RESTARTABLE_SYSCALLS If the system automatically restarts a system call that is interrupted by a signal, define `HAVE_RESTARTABLE_SYSCALLS'.
The name of this macro is misleading. It doesn't check whether system calls are restartABLE but whether they are restartED automatically by libc. It forks a subprocess that sends a signal to the parent. The parent waits for the child and checks if the wait() was interrupted. If this macro is defined you will never get EINTR so there's no need to worry about this. If it isn't defined you need to restart system calls yourself. If a platform really has interruptible I/O calls that cannot be continued or restarted without data loss there is no way to use signal handlers on that system. I doubt that such totally broken platforms are common these days.
In GNU file utilities (now merged within the new GNU coreutils), Jim Meyering uses restart wrappers for many I/O functions, so the idea of wrappers has been maturing for a while, and is used in basic, heavily used programs.
I'll check the sources. Oren
- Macro: AC_SYS_RESTARTABLE_SYSCALLS If the system automatically restarts a system call that is interrupted by a signal, define `HAVE_RESTARTABLE_SYSCALLS'.
The name of this macro is misleading. It doesn't check whether system calls are restartABLE but whether they are restartED automatically by libc. It forks a subprocess that sends a signal to the parent. The parent waits for the child and checks if the wait() was interrupted.
If this macro is defined you will never get EINTR so there's no need to worry about this. If it isn't defined you need to restart system calls yourself.
This was a feature introduced by BSD Unix in a distant past, as a change from v7 Unix (which had only the EINTR behavior). For b/w compatibility, BSD had a system call to disable the restart feature. I'm guessing that over the years the feature has been found less than helpful, so POSIX defaults to off. POSIX sigaction() has a flag SA_RESTART to enable restarting. --Guido van Rossum (home page: http://www.python.org/~guido/)
Oren Tirosh wrote:
If this macro is defined you will never get EINTR so there's no need to worry about this. If it isn't defined you need to restart system calls yourself.
I don't think that is correct. Only certain systems calls will be restarted (for BSD 4.2 it's ioctl, read, readv, write, writev, wait, and waitpid). I think the system calls restarted varies depending on the OS. Signals are a gigantic mess. I'm starting to doubt that you realize the extent of the brain damage. While I would be pleased if there was some way Python could hide the mess, I'm not convinced it is possible. Neil
Signals are a gigantic mess. I'm starting to doubt that you realize the extent of the brain damage. While I would be pleased if there was some way Python could hide the mess, I'm not convinced it is possible.
Thanks for the support Neil. That's exactly how I think about it. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Thu, Sep 05, 2002 at 11:52:28AM -0700, Neil Schemenauer wrote:
Signals are a gigantic mess. I'm starting to doubt that you realize the extent of the brain damage. While I would be pleased if there was some way Python could hide the mess, I'm not convinced it is possible.
Neil
Ah... I can almost hear the pain, frustration and despair in your voice. Obviously Guido and you got burned by this. I know other old-time Unix hackers with the same attitude. From my experience signals on Linux work just fine - I don't carry any signal scars. I can show off my Oracle scars, though. They're really gnarly. I can't hear that name mentioned without turning completely irrational about it. Certain embedded software and hardware makers also make me want to scream. Oren
From my experience signals on Linux work just fine - I don't carry any signal scars.
That just shows you haven't written enough signal code. :-) Seriously, let's please not confuse Linux with portable. The issues here are about the cross-platform viability of your suggested approach. If you've only used signals on Linux, maybe you should withdraw yourself on account of lack of experience with the real issues. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, Sep 04, 2002 at 04:48:11PM -0400, Guido van Rossum wrote:
[Jack]
Hmm, and when I think of it I don't think it's even possible to restart safely. What if I do a read() on a socket, and I request more bytes than the available physical memory (but less than VM, of course)? The kernel simply doesn't have anywhere to store the bytes other than my buffer, and if it has to return EINTR then >POOF< these bytes are gone forever.
I think that if any bytes have already been copied into your buffer, you don't get an EINTR, you get a short read.
Jack Jansen <Jack.Jansen@oratrix.com>:
Aside from some unix variations that basically didn't do restart at all there have always been problems with signal restart semantics. For sockets and various devices (raw ttys, I think) you could definitely lose data.
Sockets? Are you sure? I find it unlikely that such a severe problem could persist in many Unix variants for so long. I've never heard of any mention of such a thing. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
participants (6)
-
Greg Ewing
-
Guido van Rossum
-
Jack Jansen
-
Neil Schemenauer
-
Oren Tirosh
-
pinard@iro.umontreal.ca