Hi everyone,

I'm in the process of writing some code to defer signals during critical regions, which has involved a good deal of reading through the CPython implementation to understand the behaviors. Something I've found is that there appears to be a lot of thoughtfulness about where the signal handlers can be triggered, but this thoughtfulness is largely undocumented. I've put together a working list of behaviors from staring at the code, but what I'd like to figure out is which of these behaviors the devs think of as intended to be invariants, versus which are just accidents of how the code currently works and might change unpredictably.

And if there are things which are intended to be genuine invariants, would it be reasonable to document these formally and make them part of the language, not just for inside the CPython codebase?

What appears to be true is this:
In particular, the thing whose intentionality I'm not sure about is whether the notion of an interruptable function or instruction is meant to be an actual property of the language and/or of the CPython runtime, or whether it's actually intended that only the "high-level" rule above be true, and that all signal handlers should be considered to be fully reentrant at all times. The comments in sysmodule.c about avoiding triggering PyErr_CheckSignals() suggest that there definitely is some thinking about this within the CPython code itself.

The reason it would be useful to document this is so that if I'm trying to write a fairly generic library that handles signals (like the one I'm doing now) I can reason about where I need to be defensive about an instruction being interrupted by yet another signal, and maybe avoid calls to certain functions which are known to be interruptable, much like I would avoid calling malloc() in a C signal handler.

In the current implementation, the interruptable functions and instructions are:

Big categories:
  • Any function which calls PyErr_SetFromErrno, if errno == EINTR. (Catalogue needs to be made of these -- it's a much smaller set than the set of all calls to PyErr_SetFromErrno)
  • Basically any open, read, or write method of a raw or buffered file object.
  • Likewise, any open, read, or write method on a socket.
  • In any interactive console readline, or in input().
  • object.__str__, object.__repr__, and PyObject_Print, and anything that falls back to these.
Specific instructions:
  • Multiplication, division, or stringification of long integers.
More specific functions:
  • In `multiprocessing.shared_memory`, SharedMemory.__init__, .close, and .unlink.
  • In `multiprocessing.semaphore`, Semaphore.acquire. (But interestingly, not threading.Semaphore.acquire)
  • In `signal`, pause, signal, sigwaitinfo, sigtimedwait, pthread_kill, and pthread_sigmask.
  • In `fcntl`, fcntl and ioctl.
  • In `traceback`, any of the print methods.
  • In `faulthandler`, dump_traceback
  • In `select`, all of the methods. (select, epoll, etc)
  • In `time`, sleep.
  • In `curses`, whenever you look for key input.
  • In `tkinter`, during the main loop of a Tcl/Tk app.
  • During an SSL handshake.
--

Yonatan Zunger

Distinguished Engineer and Chief Ethics Officer


He / Him

zunger@humu.com


100 View St, Suite 101

Mountain View, CA 94041

Humu.com  · LinkedIn  · Twitter