[Twisted-Python] newbie confusion - puzzling reactor response
I'm confused be the response I get to the attached program. In a nutshell, I'm building a reader, attaching it with addReader, later removing it with removeReader. And I'm getting this: time python test_reactor.py Traceback (most recent call last): Failure: twisted.internet.error.ConnectionFdescWentAway: Uh: Filedescriptor went away. Which seems to be telling me that I don't know as much yet as I'd hoped. Why would the reactor care about a closed file descriptor that isn't even in it's interest set? --rich #!/usr/bin/env python # -*- coding: utf-8 -*- import os from zope.interface import implements from twisted.internet import reactor from twisted.internet.interfaces import IReadDescriptor class inputFile(object): implements(IReadDescriptor) def __init__(self, filename): self.filename = filename self.filedes = os.open(filename, os.O_RDONLY | os.O_NONBLOCK) reactor.addReader(self) def fileno(self): return self.filedes def connectionLost(self, reason): raise reason def logPrefix(self): return 'inputFile' def doRead(self): reactor.removeReader(self) os.close(self.filedes) self.filedes = -1 reactor.stop() if __name__ == '__main__': r = inputFile('/etc/group') reactor.addReader(r) reactor.run()
Hi:
Aren't you adding two readers? One is added in the __init__ method of
inputFile, the other in the test code.
I'm also a newbie so maybe I'm equally confused...
On Tue, Feb 9, 2010 at 8:47 PM, K. Richard Pixley
I'm confused be the response I get to the attached program.
In a nutshell, I'm building a reader, attaching it with addReader, later removing it with removeReader. And I'm getting this:
time python test_reactor.py Traceback (most recent call last): Failure: twisted.internet.error.ConnectionFdescWentAway: Uh: Filedescriptor went away.
Which seems to be telling me that I don't know as much yet as I'd hoped.
Why would the reactor care about a closed file descriptor that isn't even in it's interest set?
--rich
#!/usr/bin/env python # -*- coding: utf-8 -*-
import os
from zope.interface import implements from twisted.internet import reactor from twisted.internet.interfaces import IReadDescriptor
class inputFile(object): implements(IReadDescriptor)
def __init__(self, filename): self.filename = filename self.filedes = os.open(filename, os.O_RDONLY | os.O_NONBLOCK) reactor.addReader(self)
def fileno(self): return self.filedes
def connectionLost(self, reason): raise reason
def logPrefix(self): return 'inputFile'
def doRead(self): reactor.removeReader(self) os.close(self.filedes) self.filedes = -1 reactor.stop()
if __name__ == '__main__': r = inputFile('/etc/group') reactor.addReader(r) reactor.run()
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
I don't think so. I believe the reactor is actually added during the import. (I learned this as I discovered that reactors can't be restarted, which means you have to manually create a new one as a fixture for simple unittest work.) I looked through the code and there's a call in the reactor to fileno immediately after the call to doRead. It seems to be attempting to check for file descriptors which broke during the read, but I think that's a mistake. (Or at least, I'm confused about how else to do it). Seems to me that the only time my object has control in order to remove itself is during doRead. So I'm thinking that either... a) there's some other way to close out my object that I just haven't discovered or b) the code which checks the file descriptor, (which may have been closed), after doRead is doing so mistakenly. For now, in my real code, I'm just leaving the file descriptor. But I'd like to know how this is intended to be used. --rich (still a newbie) Mark Bailey wrote:
Hi:
Aren't you adding two readers? One is added in the __init__ method of inputFile, the other in the test code.
I'm also a newbie so maybe I'm equally confused...
On Tue, Feb 9, 2010 at 8:47 PM, K. Richard Pixley
mailto:rich@noir.com> wrote: I'm confused be the response I get to the attached program.
In a nutshell, I'm building a reader, attaching it with addReader, later removing it with removeReader. And I'm getting this:
time python test_reactor.py Traceback (most recent call last): Failure: twisted.internet.error.ConnectionFdescWentAway: Uh: Filedescriptor went away.
Which seems to be telling me that I don't know as much yet as I'd hoped.
Why would the reactor care about a closed file descriptor that isn't even in it's interest set?
--rich
#!/usr/bin/env python # -*- coding: utf-8 -*-
import os
from zope.interface import implements from twisted.internet import reactor from twisted.internet.interfaces import IReadDescriptor
class inputFile(object): implements(IReadDescriptor)
def __init__(self, filename): self.filename = filename self.filedes = os.open(filename, os.O_RDONLY | os.O_NONBLOCK) reactor.addReader(self)
def fileno(self): return self.filedes
def connectionLost(self, reason): raise reason
def logPrefix(self): return 'inputFile'
def doRead(self): reactor.removeReader(self) os.close(self.filedes) self.filedes = -1 reactor.stop()
if __name__ == '__main__': r = inputFile('/etc/group') reactor.addReader(r) reactor.run()
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com mailto:Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
------------------------------------------------------------------------
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Hi Rich:
Try removing the "reactor.addReader(self)" call from "__init__" and see what
happens. That call is made when "r" is created in
r = inputFile('/etc/group')
and immediately after that you are calling
reactor.addReader(r)
So, you are calling reactor.addReader() twice on the same instance.
Mark
On Wed, Feb 10, 2010 at 2:24 PM, K. Richard Pixley
I don't think so. I believe the reactor is actually added during the import. (I learned this as I discovered that reactors can't be restarted, which means you have to manually create a new one as a fixture for simple unittest work.)
I looked through the code and there's a call in the reactor to fileno immediately after the call to doRead. It seems to be attempting to check for file descriptors which broke during the read, but I think that's a mistake. (Or at least, I'm confused about how else to do it). Seems to me that the only time my object has control in order to remove itself is during doRead. So I'm thinking that either...
a) there's some other way to close out my object that I just haven't discovered or
b) the code which checks the file descriptor, (which may have been closed), after doRead is doing so mistakenly.
For now, in my real code, I'm just leaving the file descriptor. But I'd like to know how this is intended to be used.
--rich (still a newbie)
Mark Bailey wrote:
Hi:
Aren't you adding two readers? One is added in the __init__ method of inputFile, the other in the test code.
I'm also a newbie so maybe I'm equally confused...
On Tue, Feb 9, 2010 at 8:47 PM, K. Richard Pixley
wrote: I'm confused be the response I get to the attached program.
In a nutshell, I'm building a reader, attaching it with addReader, later removing it with removeReader. And I'm getting this:
time python test_reactor.py Traceback (most recent call last): Failure: twisted.internet.error.ConnectionFdescWentAway: Uh: Filedescriptor went away.
Which seems to be telling me that I don't know as much yet as I'd hoped.
Why would the reactor care about a closed file descriptor that isn't even in it's interest set?
--rich
#!/usr/bin/env python # -*- coding: utf-8 -*-
import os
from zope.interface import implements from twisted.internet import reactor from twisted.internet.interfaces import IReadDescriptor
class inputFile(object): implements(IReadDescriptor)
def __init__(self, filename): self.filename = filename self.filedes = os.open(filename, os.O_RDONLY | os.O_NONBLOCK) reactor.addReader(self)
def fileno(self): return self.filedes
def connectionLost(self, reason): raise reason
def logPrefix(self): return 'inputFile'
def doRead(self): reactor.removeReader(self) os.close(self.filedes) self.filedes = -1 reactor.stop()
if __name__ == '__main__': r = inputFile('/etc/group') reactor.addReader(r) reactor.run()
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
------------------------------
_______________________________________________ Twisted-Python mailing listTwisted-Python@twistedmatrix.comhttp://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Doh. You're right about the double registration. Thanks. But that doesn't change my problem. The reactor still complains about the busted descriptor after removing the reader and reseting my descriptor to -1. --rich Mark Bailey wrote:
Hi Rich:
Try removing the "reactor.addReader(self)" call from "__init__" and see what happens. That call is made when "r" is created in
r = inputFile('/etc/group')
and immediately after that you are calling reactor.addReader(r)
So, you are calling reactor.addReader() twice on the same instance.
Mark
On Wed, Feb 10, 2010 at 2:24 PM, K. Richard Pixley
mailto:rich@noir.com> wrote: I don't think so. I believe the reactor is actually added during the import. (I learned this as I discovered that reactors can't be restarted, which means you have to manually create a new one as a fixture for simple unittest work.)
I looked through the code and there's a call in the reactor to fileno immediately after the call to doRead. It seems to be attempting to check for file descriptors which broke during the read, but I think that's a mistake. (Or at least, I'm confused about how else to do it). Seems to me that the only time my object has control in order to remove itself is during doRead. So I'm thinking that either...
a) there's some other way to close out my object that I just haven't discovered or
b) the code which checks the file descriptor, (which may have been closed), after doRead is doing so mistakenly.
For now, in my real code, I'm just leaving the file descriptor. But I'd like to know how this is intended to be used.
--rich (still a newbie)
Mark Bailey wrote:
Hi:
Aren't you adding two readers? One is added in the __init__ method of inputFile, the other in the test code.
I'm also a newbie so maybe I'm equally confused...
On Tue, Feb 9, 2010 at 8:47 PM, K. Richard Pixley
mailto:rich@noir.com> wrote: I'm confused be the response I get to the attached program.
In a nutshell, I'm building a reader, attaching it with addReader, later removing it with removeReader. And I'm getting this:
time python test_reactor.py Traceback (most recent call last): Failure: twisted.internet.error.ConnectionFdescWentAway: Uh: Filedescriptor went away.
Which seems to be telling me that I don't know as much yet as I'd hoped.
Why would the reactor care about a closed file descriptor that isn't even in it's interest set?
--rich
#!/usr/bin/env python # -*- coding: utf-8 -*-
import os
from zope.interface import implements from twisted.internet import reactor from twisted.internet.interfaces import IReadDescriptor
class inputFile(object): implements(IReadDescriptor)
def __init__(self, filename): self.filename = filename self.filedes = os.open(filename, os.O_RDONLY | os.O_NONBLOCK) reactor.addReader(self)
def fileno(self): return self.filedes
def connectionLost(self, reason): raise reason
def logPrefix(self): return 'inputFile'
def doRead(self): reactor.removeReader(self) os.close(self.filedes) self.filedes = -1 reactor.stop()
if __name__ == '__main__': r = inputFile('/etc/group') reactor.addReader(r) reactor.run()
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com mailto:Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
------------------------------------------------------------------------ _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com mailto:Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com mailto:Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
------------------------------------------------------------------------
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On 07:24 pm, rich@noir.com wrote:
I don't think so. I believe the reactor is actually added during the import. (I learned this as I discovered that reactors can't be restarted, which means you have to manually create a new one as a fixture for simple unittest work.)
I looked through the code and there's a call in the reactor to fileno immediately after the call to doRead. It seems to be attempting to check for file descriptors which broke during the read, but I think that's a mistake. (Or at least, I'm confused about how else to do it). Seems to me that the only time my object has control in order to remove itself is during doRead. So I'm thinking that either...
a) there's some other way to close out my object that I just haven't discovered or
b) the code which checks the file descriptor, (which may have been closed), after doRead is doing so mistakenly.
For now, in my real code, I'm just leaving the file descriptor. But I'd like to know how this is intended to be used.
It isn't `doRead`'s job to close the file descriptor. At most, it's `doRead`'s job to signal that the descriptor is no longer worth keeping open by returning something like an instance of ConnectionDone or ConnectionLost. Then the reactor will call connectionLost on your object and you can close the file descriptor there. The documentation for how this all works could probably be improved. Once you figure it out, would you mind submitting a patch? Also, you won't accomplish much by adding a file descriptor for a normal file to the reactor. Select, poll, etc, will always indicate that such descriptors are both readable and writeable. Jean-Paul
exarkun@twistedmatrix.com wrote:
It isn't `doRead`'s job to close the file descriptor. At most, it's `doRead`'s job to signal that the descriptor is no longer worth keeping open by returning something like an instance of ConnectionDone or ConnectionLost. Then the reactor will call connectionLost on your object and you can close the file descriptor there.
Thank you! That's the info I was looking for.
The documentation for how this all works could probably be improved. Once you figure it out, would you mind submitting a patch?
I don't think I've even seen mention of ConnectionDone in the doc yet. I think there's a design decision here about the doc. It seems to me that the return codes are part of the interface and as such should probably be documented in twisted.internet.interfaces.py whereas there's almost nothing there now. Instead, some of this is in twisted.internet.abstract. Granted, the distinction between an abstract class and a "zope.interface.Interface" is subtle. As a newbie, the interface is the thing I find first and most easily. I'm directed there by the doc and by the reference material. I have to dig around to even notice that the abstract class exists.
Also, you won't accomplish much by adding a file descriptor for a normal file to the reactor. Select, poll, etc, will always indicate that such descriptors are both readable and writeable. I could swear that wasn't true when I first mucked about with select, but that was a couple decades ago. Thanks for the update.
In any case, my point was more about illustration and testing. --rich
On 07:35 pm, rich@noir.com wrote:
exarkun@twistedmatrix.com wrote:
It isn't `doRead`'s job to close the file descriptor. At most, it's `doRead`'s job to signal that the descriptor is no longer worth keeping open by returning something like an instance of ConnectionDone or ConnectionLost. Then the reactor will call connectionLost on your object and you can close the file descriptor there. Thank you! That's the info I was looking for. The documentation for how this all works could probably be improved. Once you figure it out, would you mind submitting a patch? I don't think I've even seen mention of ConnectionDone in the doc yet.
I think there's a design decision here about the doc. It seems to me that the return codes are part of the interface and as such should probably be documented in twisted.internet.interfaces.py whereas there's almost nothing there now. Instead, some of this is in twisted.internet.abstract. Granted, the distinction between an abstract class and a "zope.interface.Interface" is subtle.
As a newbie, the interface is the thing I find first and most easily. I'm directed there by the doc and by the reference material. I have to dig around to even notice that the abstract class exists.
You're absolutely right. These interfaces should be documenting the meaning of the return value for these methods, since it is an integral part of the required interface.
Also, you won't accomplish much by adding a file descriptor for a normal file to the reactor. Select, poll, etc, will always indicate that such descriptors are both readable and writeable.
Er... on second thought... isn't there still a utility in asynchronous file io which yields to the reactor?
It may be always readable/writable, but if I simply read/write, I'll block the process for as long as that takes, block on read, block on write. Whereas if I use async io on the descriptor and go through the reactor, I'm effectively yielding to the reactor and any other actionable descriptors on each loop as well as allowing my reads and writes to happen simultaneously.
Or am I missing something?
There could be utility in such, but Twisted has no support for it, largely because actual support on various platforms is still pretty ragged. On Linux, you can get the aio_* family of functions, but they're pretty crap. They have tons of limitations (only block-aligned reads allowed, only a certain number of outstanding operations (system wide) at a time, etc, and the failure mode for not complying with these limitations is that the APIs block). It's a bit better on Windows, so someone could probably fashion an extension to iocpreactor for this. There isn't a lot of developer attention focused on implementing Windows-only extensions right now, though. Jean-Paul
On Feb 12, 2010, at 3:11 PM, exarkun@twistedmatrix.com wrote:
On 07:35 pm, rich@noir.com wrote:
Er... on second thought... isn't there still a utility in asynchronous file io which yields to the reactor?
It may be always readable/writable, but if I simply read/write, I'll block the process for as long as that takes, block on read, block on write. Whereas if I use async io on the descriptor and go through the reactor, I'm effectively yielding to the reactor and any other actionable descriptors on each loop as well as allowing my reads and writes to happen simultaneously.
Or am I missing something?
There could be utility in such, but Twisted has no support for it, largely because actual support on various platforms is still pretty ragged.
On Linux, you can get the aio_* family of functions, but they're pretty crap. They have tons of limitations (only block-aligned reads allowed, only a certain number of outstanding operations (system wide) at a time, etc, and the failure mode for not complying with these limitations is that the APIs block).
It's a bit better on Windows, so someone could probably fashion an extension to iocpreactor for this. There isn't a lot of developer attention focused on implementing Windows-only extensions right now, though.
In my opinion, the right way to go about something like this would be to come up with an API for asynchronous File I/O in Twisted, implement that API using subprocesses or maybe the reactor threadpool, and then attempt to optimize and simplify it using special platform-speciifc APIs later. (Important note: do not _expose_ the threaded nature of the code to application code at any point: just deliver the data to something in the reactor thread, to dispatch as it sees fit.) My impression is that OS-level asynchronous file I/O APIs are fairly raw because, unlike network connectivity, you won't get thousands of connections at once. If you only have one disk, you can only really get a benefit from two, maybe three file I/O slave processes, and that's a fairly small amount of resources to manage. Granted, it's tricky to really identify how many "disks" you've got in a system, and the performance characteristics change radically based on what kind of disk technology is involved, but generally speaking a few worker threads and a queue of I/O operations would cover the vast majority of use-cases.
Glyph Lefkowitz wrote:
If you only have one disk, you can only really get a benefit from two, maybe three file I/O slave processes, and that's a fairly small amount of resources to manage. Granted, it's tricky to really identify how many "disks" you've got in a system, and the performance characteristics change radically based on what kind of disk technology is involved, but generally speaking a few worker threads and a queue of I/O operations would cover the vast majority of use-cases. I'm working with parallelized build servers. We often have raided disks, solid state disks, servers with huge amounts of disk cache specifically so that an entire build happens in memory, etc. File io is our bottleneck.
I think you probably are also forgetting about NFS. NFS isn't slower than native disk in terms of throughput, only in terms of latency, which is a fabulous opportunity for asyncronous file io. (Granted, NFS seems to have fallen out of fashion recently.) I think twisted already has everything that's required. It could probably use a slightly more friendly interface paradigm so the user doesn't have to do his own os.open, but really, even that wouldn't save much. Reactor core was enough to sell me on twisted. That's probably all I'll even be using. And compared to writing my own, that's enough to be useful. --rich
On Feb 12, 2010, at 9:51 PM, K. Richard Pixley wrote:
I'm working with parallelized build servers. We often have raided disks, solid state disks, servers with huge amounts of disk cache specifically so that an entire build happens in memory, etc. File io is our bottleneck.
Yeah, this level of disk manipulation is past the point where a little bit less blocking on I/O will help... I imagine you've already got some kind of process/thread pooling solution already, or at least you'll need one.
I think you probably are also forgetting about NFS. NFS isn't slower than native disk in terms of throughput, only in terms of latency, which is a fabulous opportunity for asyncronous file io. (Granted, NFS seems to have fallen out of fashion recently.)
No, I'm not forgetting about it: I'm just saying that once you've got an API that applications can start using that gives *some* performance benefit (non-blocking disk I/O at the expense of spinning up a few threads / processes behind the scenes), you can always optimize it for other use-cases later, without necessarily changing the API.
I think twisted already has everything that's required. It could probably use a slightly more friendly interface paradigm so the user doesn't have to do his own os.open, but really, even that wouldn't save much.
It would allow us to do it more portably, I think. os.open()'s behavior can vary a lot depending on what you do with it.
Reactor core was enough to sell me on twisted. That's probably all I'll even be using. And compared to writing my own, that's enough to be useful.
Great, glad to hear it!
exarkun@twistedmatrix.com wrote:
Also, you won't accomplish much by adding a file descriptor for a normal file to the reactor. Select, poll, etc, will always indicate that such descriptors are both readable and writeable. Er... on second thought... isn't there still a utility in asynchronous file io which yields to the reactor?
It may be always readable/writable, but if I simply read/write, I'll block the process for as long as that takes, block on read, block on write. Whereas if I use async io on the descriptor and go through the reactor, I'm effectively yielding to the reactor and any other actionable descriptors on each loop as well as allowing my reads and writes to happen simultaneously. Or am I missing something? --rich
participants (4)
-
exarkun@twistedmatrix.com
-
Glyph Lefkowitz
-
K. Richard Pixley
-
Mark Bailey