[Twisted-Python] Patch for twisted.protocols.nntp

Since "suck" doesn't work for me for a variety of reasons, I decided to replace it with a solution based on twisted.protocol.nntp. After a few hours of hacking, I now have a nice program which nicely saturates my downlink bandwidth. ;-) To make a long story short, the attached patch implements the changes and fixes I needed to actually get there. The "Allow the article text to be a callable or deferred" change implements the common situation where I ask server B whether it would like to be fed article X before actually pulling that article from server A, and/or where the pull is still in progress. There is one somewhat- incompatible change here, in that I return the GROUP results (article count, high and low numbers) as integers, not as text. In practice they're going to be int()ized anyway, so this should not be a problem. # twisted/protocols/nntp.py # Fixes for news gateways / 'suck'-style operation / INN as server: # - The client uses \n and does NOT esacpe start-of-line dots. # The server uses \r\n and escapes dots ONCE, not twice (ouch). # - POST temporarily blocks streaming. Make sure this is observed, # pass a Deferred out for clients to restart themselves with # - Add a command to allow MODE READER # - Allow bare reply numbers without text # - Allow the article text to be a callable or a deferred # - use CHECK/TAKETHIS if there's a message ID # - return group article numbers (GROUP reply) as numbers # - Clean up article linefeed handling # twisted/test/test_nntp.py # Make sure that dot escapes are passed cleanly. # Make sure that no empty lines are added at the end. # Use the unittest object for checking. # Use client-side line endings for the client, # assume that the server side is transparent. # Make sure that the test doesn't just peter out halfway through. # Remove the commented-out iterate() calls. # loopback() already does the work for us. -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de -- Standards are different for all things, so the standard set by man is by no means the only 'certain' standard. If you mistake what is relative for something certain, you have strayed far from the ultimate truth. -- Chuang Tzu

On Sun, Jun 22, 2003 at 05:14:24PM +0200, Matthias Urlichs wrote:
Since "suck" doesn't work for me for a variety of reasons, I decided to replace it with a solution based on twisted.protocol.nntp. After a few hours of hacking, I now have a nice program which nicely saturates my downlink bandwidth. ;-)
Cool!
To make a long story short, the attached patch implements the changes and fixes I needed to actually get there.
The "Allow the article text to be a callable or deferred" change implements the common situation where I ask server B whether it would like to be fed article X before actually pulling that article from server A, and/or where the pull is still in progress.
Hmm, this is the only part of the patch I am unsure about. The API seems a little too tuned to your use-case. I think the way to go for this would be to have a Producer passed in and make the NNTP protocol a Consumer for that (in turn acting as a Producer for its transport object). Would you be willing to make this change? (If you need an example of how this might work, check out smtp.py)
There is one somewhat- incompatible change here, in that I return the GROUP results (article count, high and low numbers) as integers, not as text. In practice they're going to be int()ized anyway, so this should not be a problem.
I think this is fine.
# twisted/protocols/nntp.py # Fixes for news gateways / 'suck'-style operation / INN as server: # - The client uses \n and does NOT esacpe start-of-line dots. # The server uses \r\n and escapes dots ONCE, not twice (ouch). # - POST temporarily blocks streaming. Make sure this is observed, # pass a Deferred out for clients to restart themselves with # - Add a command to allow MODE READER # - Allow bare reply numbers without text # - Allow the article text to be a callable or a deferred # - use CHECK/TAKETHIS if there's a message ID # - return group article numbers (GROUP reply) as numbers # - Clean up article linefeed handling # twisted/test/test_nntp.py # Make sure that dot escapes are passed cleanly. # Make sure that no empty lines are added at the end. # Use the unittest object for checking. # Use client-side line endings for the client, # assume that the server side is transparent. # Make sure that the test doesn't just peter out halfway through. # Remove the commented-out iterate() calls. # loopback() already does the work for us.
Thanks for these test fixes/cleanups, too. Jp -- In the days when Sussman was a novice Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?" asked Minsky. "I am training a randomly wired neural net to play Tic-Tac-Toe." "Why is the net wired randomly?" asked Minsky. "I do not want it to have any preconceptions of how to play." Minsky shut his eyes. "Why do you close your eyes?" Sussman asked his teacher. "So the room will be empty." At that moment, Sussman was enlightened.

Hi, Jp Calderone wrote:
Hmm, this is the only part of the patch I am unsure about. The API seems a little too tuned to your use-case. I think the way to go for this would be to have a Producer passed in and make the NNTP protocol a Consumer for that (in turn acting as a Producer for its transport object). Would you be willing to make this change? (If you need an example of how this might work, check out smtp.py)
Hmm. Looking at the SMTP case, it needs refactoring in that the producer isn't decoupled from the file object it ends up reading, thus I can't plug a different kind of producer in there. Looking further, there's a slightly-broken data-from-file producer in doc/historic/2003/pycon/twisted-internet/twisted-internet.py ... Anyway, I need NNTP to be a streaming protocol. With a Deferred, this is easy -- each Deferred gets a callback which writes TAKETHIS plus the message to the server (and queues the result code recognition), so these can complete in any order, which is very nice. A producer interface would instead have an arbitrary number of producers, all of which are active, but as soon as the first one returns anything I have to temporarily deactivate all the others. That sounds dangerous -- what if a producer can't be paused? Then I need to buffer their data, too, which is no different from the current case where the article data also end up in memory. -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de -- Man: I know how to please a woman. Woman: Then please leave me alone.

On Mon, Jun 23, 2003 at 12:51:40PM +0200, Matthias Urlichs wrote:
Hi, Jp Calderone wrote:
Hmm, this is the only part of the patch I am unsure about. The API seems a little too tuned to your use-case. I think the way to go for this would be to have a Producer passed in and make the NNTP protocol a Consumer for that (in turn acting as a Producer for its transport object). Would you be willing to make this change? (If you need an example of how this might work, check out smtp.py)
Hmm. Looking at the SMTP case, it needs refactoring in that the producer isn't decoupled from the file object it ends up reading, thus I can't plug a different kind of producer in there.
Yea, it isn't perfect.
Looking further, there's a slightly-broken data-from-file producer in doc/historic/2003/pycon/twisted-internet/twisted-internet.py ...
Anyway, I need NNTP to be a streaming protocol. With a Deferred, this is easy -- each Deferred gets a callback which writes TAKETHIS plus the message to the server (and queues the result code recognition), so these can complete in any order, which is very nice.
They can complete in any order because the code responsible for the Deferred's callback must buffer the message in memory.
A producer interface would instead have an arbitrary number of producers, all of which are active, but as soon as the first one returns anything I have to temporarily deactivate all the others. That sounds dangerous -- what if a producer can't be paused? Then I need to buffer their data, too, which is no different from the current case where the article data also end up in memory.
At the very least, the object that is currently a string should become a file object (This should have minimal impact on your code -- if buffering in memory is acceptable to you, you only have to build a StringIO object from your string), and the NNTP protocol class should consume it in a manner similar to that of smtp.py. It -does- seem that this is becoming a common task (smtp.py and pop3.py do it, imap4.py needs to do, and now nntp.py), so maybe I'll write a utility mixin to help out, and I'll see if I can generalize it away from just file-like objects too. This gives clients at least the opportunity to avoid loading entire messages into memory, which is what I am looking for. Am I right in thinking this leaves you with all the functionality you want? Jp -- "Minerals are inexhaustible and will never be depleted. A stream of investment creates additions to proved reserves, a very large in-ground inventory, constantly renewed as it is extracted... How much was in the ground at the start and how much will be left at the end are unknown and irrelevant." -- Morry Adelman, World Renowned Economist

Hi, Jp Calderone wrote:
They can complete in any order because the code responsible for the Deferred's callback must buffer the message in memory.
Nope. Converting the code to assemble the articles in file-like objects, or whatever, instead, wouldn't make much of a difference to my current code -- they would still be able to complete in any order. Part of the niceness of streaming CHECK and TAKETHIS messages to an NNTP server is that there is no ordering required; you can't do that with SMTP. On the other hand, thinking about this I just had an idea how to manage the transition from a bunch of parallel Deferreds to a serialized producer/consumer scheme (addCallback() a code snipped which enqueues the Deferred onto a queue, and take it from there one-by-one). It's still going to start out as a Deferred, though; "a producer which needs to do something special when it first delivers data and which might trigger an error instead" is complicated code which lumps at least three concepts into one piece of code, while using a Deferred in the first step leads to nicely separated stages with distinct code -- always a win in my book.
At the very least, the object that is currently a string should become a file object (This should have minimal impact on your code -- if buffering in memory is acceptable to you, you only have to build a StringIO object from your string)
That would be no problem. Whether the Deferred returns an in-memory string or a file object doesn't make a whole lot of difference, esp. as a simple 'if not hasattr(f,"read"): f = StringIO.StringIO(f)' is sufficient for turning one into the other.
It -does- seem that this is becoming a common task (smtp.py and pop3.py do it, imap4.py needs to do, and now nntp.py), so maybe I'll write a utility mixin to help out, and I'll see if I can generalize it away from just file-like objects too.
That would be nice to have. I'd write it as an adapter or similar, i.e. a class which turns an LF-based dots-ignorant possibly-without-final-LF producer into one which has CRLFs, dot-escapes, and a guaranteed CRLF at the end. That way it'd be useable for things which aren't a streaming protocol. Direct back-end access to the NNTP storage comes to mind; it stores articles in the latter format. -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de -- There will always be some delightful mysteries in your life.

On Sun, Jun 22, 2003 at 05:14:24PM +0200, Matthias Urlichs wrote:
Since "suck" doesn't work for me for a variety of reasons, I decided to replace it with a solution based on twisted.protocol.nntp. After a few hours of hacking, I now have a nice program which nicely saturates my downlink bandwidth. ;-)
To make a long story short, the attached patch implements the changes and fixes I needed to actually get there.
I think this patch is against a version of nntp.py that I don't have. Could you make sure it is against current CVS, or at least a previously released version (I'm quite certain we've never had 1.0.5smurf-6) Jp -- http://catandgirl.com/view.cgi?44

Hi, Jp Calderone wrote:
I think this patch is against a version of nntp.py that I don't have. Could you make sure it is against current CVS, or at least a previously released version (I'm quite certain we've never had 1.0.5smurf-6)
Sorry about that. I'll merge up to current CVS and then send you a complete patch directly. -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de -- If the shoe fits, it's ugly.
participants (2)
-
Jp Calderone
-
Matthias Urlichs