BTW, I'm sending as andrea@cpushare.com but this email is really meant to be from andrea@suse.de, the lists are set so that I can't post unless I'm subscribed and to avoid unnecessary email load, I'm avoiding to subscribe two of my accounts to the same list. On Mon, Feb 28, 2005 at 05:52:05PM -0500, James Y Knight wrote:
I agree this code is somewhat suboptimal. However, I do not agree that it works only by luck.
The thing is that readable and writeable (in select(2) terms) could be returned from linux 2.6.9 and all previous kernels immediatly without sleeping, even if there was no disconnect event. You only needed the write buffer empty for it to return POLLIN|POLLOUT without sleeping. That's what I mean with "by luck". It wasn't twisted mistake but a linux mistake apparently. That check of "r and w" definitely could be true even if the other side of the pipe didn't disconnect in past linux.
In response to your main question of "why is it checking for readability at all", there is a good answer: to get the disconnect event without trying to write data. Select doesn't have a "err" fdset,
Ok, btw even linux always returns the fd in both readable and writeable when the other side of the pipe disconnects. This because we raise a POLLERR and this is the mask to check when a fd is readable/writeable #define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR) #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR) So when POLLERR is raised, the fd is returned readable and writeable from select. Current kernel code is this: static unsigned int pipe_poll(struct file *filp, poll_table *wait) { unsigned int mask; struct inode *inode = filp->f_dentry->d_inode; struct pipe_inode_info *info = inode->i_pipe; int nrbufs; poll_wait(filp, PIPE_WAIT(*inode), wait); /* Reading only -- no need for acquiring the semaphore. */ nrbufs = info->nrbufs; mask = 0; if (filp->f_mode & FMODE_READ) { mask = (nrbufs > 0) ? POLLIN | POLLRDNORM : 0; if (!PIPE_WRITERS(*inode) && filp->f_version != PIPE_WCOUNTER(*inode)) mask |= POLLHUP; } if (filp->f_mode & FMODE_WRITE) { mask |= (nrbufs < PIPE_BUFFERS) ? POLLOUT | POLLWRNORM : 0; if (!PIPE_READERS(*inode)) mask |= POLLERR; } return mask; } With this new kernel code, a write pipe will _never_ return readable, unless it's disconnected, and it will match the the current twisted code as far as I can tell. As you can see we never set POLLOUT for a write-pipe (opened in WRONLY mode), and in turn the only way to end up in the POLLIN_SET is that POLLERR is set.
so you have to select for either readability or writability. You can't select for writability when you have no data to write, or else you'll be continuously waking up. On all UNIX OSes that I know of, write pipes show up as readable in select when the other side has closed, for just this reason. I don't know if it's documented in any specs, but as far as I can tell, it's universally true. Breaking this would be a Bad Thing, for I suspect more apps than just Twisted.
Linux never returned any information about the status of the other side of the pipe via the readable information, because readable was always returned true, since POLLIN was set unconditionally by the pipe_poll code. So we're not going to break anything in that sense, it was already broken and infact we're fixing it right now ;). POLLIN was providing absolutely zero information because it was always set unconditionally for both readers and writers. That was wrong and that's why I changed it and this seem to make the doRead code valid for the first time in linux.
Of course, if it were that simple, doRead would just be implemented as "return CONNECTION_LOST". From my testing, that'd even work on BSDs. However, linux adds its own little wrinkle to the problem. On linux, the observed behavior seems to be that only one write can be outstanding at a time -- if there is data in the buffer, writable will be false and readable will be true. Otherwise, the inverse. This is
It's never the inverse since pollin was always forced to be set, both for readers and writers, so a write fd was always returned as readable unconditionally. But you're perfectly right that if there is data in the buffer writeable was false and readable was true. (readable provides no info in linux)
quite silly...but it's what happens. As far as I can tell, at no time are both true, except when the pipe is disconnected. On BSD, a write pipe is simply never readable until the reader is closed, which is a lot more sensible. Also note, that bit of code is unnecessary if you're using pollreactor rather than selectreactor. That is, I think it should be fine with
This is great news. However note that the 2.6.11-rc5 official kernel broke my app with the pollreactor too. So something else must have been broken for the pollreactor too making the same assumption that doRead did, and it's working fine again with my pipe patch infact.
twisted if the pipe is just POLLHUP when it is disconnected rather than POLLHUP|POLLIN|POLLOUT.
It's POLLERR not POLLHUP. POLLHUP is set only when the "writer" side disconnected and you're listening to a reader fd. POLLERR is instead returned when the "reader" disconnected and you're listening to a writer fd. In select terms it means when the reader disconnects the fd will be both readable and writeable. And when the writer disconnects the fd will be reported only as readable. In poll terms it means with linux _only_ POLLERR will be set when the read disconnects, and POLLIN will never ever be set on a WRONLY pipe. So with the new fixes it seems linux does the right thing, and we at leats mimic the select behaviour that twisted expects, I hope we mimic the poll behaviour that twisted expects too. I found the openbsd implementation of pipe_poll here: http://fxr.watson.org/fxr/source/kern/sys_pipe.c?v=OPENBSD#L661 671 if (events & (POLLIN | POLLRDNORM)) { 672 if ((rpipe->pipe_buffer.cnt > 0) || 673 (rpipe->pipe_state & PIPE_EOF)) 674 revents |= events & (POLLIN | POLLRDNORM); 675 } The way I read it, even openbsd will report the fd as readable regardless if the channel is disconnected, if the buffer has something into it. Anyway linux won't do that anymore, a write fd will be now reported as readable only if the reader has disconnected, and so you're safe to assume the reader disconnected if selects returns readable && writeable. So I think we should be fine now and no changes are required in twisted at least for the select reactor side. Please double check the pollreactor side too, we'll never return POLLIN anymore for a WRONLY pipe fd. I'm going to update a semi-productive system running twisted servers using processes too, with the new pipe_poll code too to see what happens (that's the good thing of not being fully productive yet, so I can experiment a bit more ;).