[Twisted-Python] CONNECTION_LOST not an integer (docstring error?)

Several places in the documentation refer to the method writeSomeData () returning a negative integer if the connection is lost. (In particular /api/ twisted.internet.abstract.FileDescriptor.html#writeSomeData.) However, twisted.internet.main.CONNECTION_LOST is often returned which is not an integer. The documentation and code should be brought into sync unless there is a more up-to-date reference that I am missing. Thanks, Michael.

On Wed, 1 Oct 2008 12:00:29 -0600, Michael McNeil Forbes <mforbes@physics.ubc.ca> wrote:
Hardly anyone actually needs `writeSomeData`. It's mostly an implementation detail of a certain group of reactors. However, if you'd like, please feel free to open a ticket in the issue tracker and attach a documentation patch. Suggestions or patches on the mailing list will typically be forgotten. Jean-Paul

On Oct 1, 2008, at 12:04 PM, Jean-Paul Calderone wrote:
I have opened a ticket. A question then that is sure to expose my ignorance of twisted... Why does writeSomeData not simply raise CONNECTION_LOST as an exception? Checking return values is quite un-pythonic. Is there a deep reason for this? Michael. P.S. I came across this because I was trying to use twisted running in a thread to write data resulting from a long computation that I have not yet turned into a producer. The more conventional "write" method was failing if the socket backed up, and provided no simple way of determining if data was being dropped. My solution was to used writeSomeData, and then have the computation decide to throw out some of the data if it is being produced too rapidly, but I need to know how much has been sent so I can decide what to throw away...

On 09:24 pm, mforbes@physics.ubc.ca wrote:
writeSomeData is an internal API that nobody should really use, so it's not factored for convenience; it's also deep down in the guts of the reactor's innermost loop, so it pays a lot of attention to efficiency. Raising exceptions is really, really slow, at least by the standards of that kind of code.
You do know that Twisted APIs are not thread safe, right? You can't call write() from a thread?

On Wed, 1 Oct 2008 15:24:05 -0600, Michael McNeil Forbes <mforbes@physics.ubc.ca> wrote:
Thanks. :)
The reason is probably that whoever wrote it thought that raising an exception would be enough slower than returning a special value that it would be worth doing something slightly unusual. Compared to returning a value, raising an exception is expensive in CPython, but I doubt there is any surviving benchmark which demonstrates that it makes a difference here. However, it's a very low-level API and since it isn't intended to be used by application code, it doesn't seem like a very high priority to make it less confusing.
This seems more interesting than the main point of your email. :) You should probably just do the work to turn the computation into a producer. This because: * writeSomeData is low level and crufty, as you've discovered * Not all reactors have this API, so your program will break based on the reactor selected to run it * producers give you all the same information Since you mentioned threads, I'll also point out that Twisted APIs are not safe to use from non-reactor threads without reactor.callFromThread. If you aren't using callFromThread, that would explain bad behavior from the write method. Jean-Paul

By the way, is there any convention by which "high level" API routines are distinguished from "low level" routines? It is really not that clear from the docs... (writeSomeData comes first in the listing etc.)
I know that and am using callFromThread: The reason I am having bad behavior is because the socket is backing up. Once I put in the logic to throw out data, it works fine... but the lack of reactor support concerns me somewhat. I don't want the computation to stop, but think I could put an intermediate "push producer" that stops by discarding the data. I am threading because the computation is the main task, and the network stuff is simply a window into the state of the computation. I want the computation to run continually, and it needs to do so in a blocking sort of way, periodically checking and pushing the partial results (like once every 30 minutes or so). I want the calculation code to function on its own, with or without the twisted components and I don't see a simple way of doing this if twisted is in complete control without some type of threading. My current design is to write my calculation as if twisted did not exist, and just push the current results onto a Queue periodically. Then I run twisted in a separate thread with clients that periodically check the queue, pop stuff off and dispatch it to any clients who happen to connect and want the data. Is there a simpler way to do this all with twisted without using threads? (The main goal is having an extremely minimal set of hooks for the people writing the computational code, and having the computational code run as fast as possible.) Thanks for all of the suggestions and prompt comments! (And for one very "twisted" library;-) Michael.

On 1 Oct, 10:37 pm, mforbes@physics.ubc.ca wrote:
On Oct 1, 2008, at 3:44 PM, Jean-Paul Calderone wrote:
writeSomeData only comes first if you're looking at the documentation for "Connection", which is itself kind of an internal thing. The internals definitely aren't as cleanly separated from the public / application interface as they should be, but a good rule of thumb is that high level stuff is what's covered in the tutorial documentation. For example, none of <http://twistedmatrix.com/projects/core/documentation/howto/producers.html>, <http://twistedmatrix.com/projects/core/documentation/howto/servers.html>, or <http://twistedmatrix.com/projects/core/documentation/howto/clients.html> covers writeSomeData. Another rule of thumb is that the high-level stuff is on explicitly specified interfaces, such as <http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.interfaces.ITr...>. Of course there are some low-level interfaces too, for example <http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.interfaces.IWr...> but their documentation is typically not as expository. But the ultimate rule is that high-level interfaces are convenient, and the low-level ones (as you have noticed about writeSomeData) are not :).
What do you mean by "backing up"? That doesn't really make sense. When a transport "backs up" in Twisted, it just allocates a larger and larger buffer to store the data that is queued to be sent. And what is the "lack of reactor support" you're talking about? It seems to me that the reactor supports everything you need.
I don't want the computation to stop, but think I could put an intermediate "push producer" that stops by discarding the data.
I don't *really* understand what you're trying to accomplish here; it soudns to me like you actually want a pull producer that just always sends the latest state in resumeProducing(), assuming it's always changing. Of course the "every 30 minutes" timeframe confuses that somewhat. However, using some kind of producer is definitely the way to go.
You could use subprocesses. Valentino Volonghi is working on a convenient process-pool for use with Twisted: <https://launchpad.net/ampoule/>. I don't think that would actually help you that much though, since you'd still need to have application- specific buffering logic (it's a slightly unusual requirement to throw away intermediary state depending on the buffer saturation).

On Oct 3, 2008, at 5:25 PM, glyph@divmod.com wrote:
(On the client side, I want the user to have full access to a python interpreter with readline functionality etc. and the ability to plot things with matplotlib using Tkinter. As far as I can see, I can't get readline functionality with a manhole or similar interface controlled by twisted, so again I have to run twisted in a separate thread.)
Thanks for the suggestions and help. Michael.

On 06:00 pm, mforbes@physics.ubc.ca wrote:
The documentation and code should be brought into sync unless there is a more up-to-date reference that I am missing.
Hi Michael, Do you think you can file a ticket in our tracker (perhaps attaching a patch?) at <http://twistedmatrix.com/> for this issue? Issues reported on the mailing list are likely to get lost over time.

On Wed, 1 Oct 2008 12:00:29 -0600, Michael McNeil Forbes <mforbes@physics.ubc.ca> wrote:
Hardly anyone actually needs `writeSomeData`. It's mostly an implementation detail of a certain group of reactors. However, if you'd like, please feel free to open a ticket in the issue tracker and attach a documentation patch. Suggestions or patches on the mailing list will typically be forgotten. Jean-Paul

On Oct 1, 2008, at 12:04 PM, Jean-Paul Calderone wrote:
I have opened a ticket. A question then that is sure to expose my ignorance of twisted... Why does writeSomeData not simply raise CONNECTION_LOST as an exception? Checking return values is quite un-pythonic. Is there a deep reason for this? Michael. P.S. I came across this because I was trying to use twisted running in a thread to write data resulting from a long computation that I have not yet turned into a producer. The more conventional "write" method was failing if the socket backed up, and provided no simple way of determining if data was being dropped. My solution was to used writeSomeData, and then have the computation decide to throw out some of the data if it is being produced too rapidly, but I need to know how much has been sent so I can decide what to throw away...

On 09:24 pm, mforbes@physics.ubc.ca wrote:
writeSomeData is an internal API that nobody should really use, so it's not factored for convenience; it's also deep down in the guts of the reactor's innermost loop, so it pays a lot of attention to efficiency. Raising exceptions is really, really slow, at least by the standards of that kind of code.
You do know that Twisted APIs are not thread safe, right? You can't call write() from a thread?

On Wed, 1 Oct 2008 15:24:05 -0600, Michael McNeil Forbes <mforbes@physics.ubc.ca> wrote:
Thanks. :)
The reason is probably that whoever wrote it thought that raising an exception would be enough slower than returning a special value that it would be worth doing something slightly unusual. Compared to returning a value, raising an exception is expensive in CPython, but I doubt there is any surviving benchmark which demonstrates that it makes a difference here. However, it's a very low-level API and since it isn't intended to be used by application code, it doesn't seem like a very high priority to make it less confusing.
This seems more interesting than the main point of your email. :) You should probably just do the work to turn the computation into a producer. This because: * writeSomeData is low level and crufty, as you've discovered * Not all reactors have this API, so your program will break based on the reactor selected to run it * producers give you all the same information Since you mentioned threads, I'll also point out that Twisted APIs are not safe to use from non-reactor threads without reactor.callFromThread. If you aren't using callFromThread, that would explain bad behavior from the write method. Jean-Paul

By the way, is there any convention by which "high level" API routines are distinguished from "low level" routines? It is really not that clear from the docs... (writeSomeData comes first in the listing etc.)
I know that and am using callFromThread: The reason I am having bad behavior is because the socket is backing up. Once I put in the logic to throw out data, it works fine... but the lack of reactor support concerns me somewhat. I don't want the computation to stop, but think I could put an intermediate "push producer" that stops by discarding the data. I am threading because the computation is the main task, and the network stuff is simply a window into the state of the computation. I want the computation to run continually, and it needs to do so in a blocking sort of way, periodically checking and pushing the partial results (like once every 30 minutes or so). I want the calculation code to function on its own, with or without the twisted components and I don't see a simple way of doing this if twisted is in complete control without some type of threading. My current design is to write my calculation as if twisted did not exist, and just push the current results onto a Queue periodically. Then I run twisted in a separate thread with clients that periodically check the queue, pop stuff off and dispatch it to any clients who happen to connect and want the data. Is there a simpler way to do this all with twisted without using threads? (The main goal is having an extremely minimal set of hooks for the people writing the computational code, and having the computational code run as fast as possible.) Thanks for all of the suggestions and prompt comments! (And for one very "twisted" library;-) Michael.

On 1 Oct, 10:37 pm, mforbes@physics.ubc.ca wrote:
On Oct 1, 2008, at 3:44 PM, Jean-Paul Calderone wrote:
writeSomeData only comes first if you're looking at the documentation for "Connection", which is itself kind of an internal thing. The internals definitely aren't as cleanly separated from the public / application interface as they should be, but a good rule of thumb is that high level stuff is what's covered in the tutorial documentation. For example, none of <http://twistedmatrix.com/projects/core/documentation/howto/producers.html>, <http://twistedmatrix.com/projects/core/documentation/howto/servers.html>, or <http://twistedmatrix.com/projects/core/documentation/howto/clients.html> covers writeSomeData. Another rule of thumb is that the high-level stuff is on explicitly specified interfaces, such as <http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.interfaces.ITr...>. Of course there are some low-level interfaces too, for example <http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.interfaces.IWr...> but their documentation is typically not as expository. But the ultimate rule is that high-level interfaces are convenient, and the low-level ones (as you have noticed about writeSomeData) are not :).
What do you mean by "backing up"? That doesn't really make sense. When a transport "backs up" in Twisted, it just allocates a larger and larger buffer to store the data that is queued to be sent. And what is the "lack of reactor support" you're talking about? It seems to me that the reactor supports everything you need.
I don't want the computation to stop, but think I could put an intermediate "push producer" that stops by discarding the data.
I don't *really* understand what you're trying to accomplish here; it soudns to me like you actually want a pull producer that just always sends the latest state in resumeProducing(), assuming it's always changing. Of course the "every 30 minutes" timeframe confuses that somewhat. However, using some kind of producer is definitely the way to go.
You could use subprocesses. Valentino Volonghi is working on a convenient process-pool for use with Twisted: <https://launchpad.net/ampoule/>. I don't think that would actually help you that much though, since you'd still need to have application- specific buffering logic (it's a slightly unusual requirement to throw away intermediary state depending on the buffer saturation).

On Oct 3, 2008, at 5:25 PM, glyph@divmod.com wrote:
(On the client side, I want the user to have full access to a python interpreter with readline functionality etc. and the ability to plot things with matplotlib using Tkinter. As far as I can see, I can't get readline functionality with a manhole or similar interface controlled by twisted, so again I have to run twisted in a separate thread.)
Thanks for the suggestions and help. Michael.

On 06:00 pm, mforbes@physics.ubc.ca wrote:
The documentation and code should be brought into sync unless there is a more up-to-date reference that I am missing.
Hi Michael, Do you think you can file a ticket in our tracker (perhaps attaching a patch?) at <http://twistedmatrix.com/> for this issue? Issues reported on the mailing list are likely to get lost over time.
participants (3)
-
glyph@divmod.com
-
Jean-Paul Calderone
-
Michael McNeil Forbes