[Twisted-Python] POP3 improvements
![](https://secure.gravatar.com/avatar/6cd37343ce1da139b1d2ff5ee717f5d5.jpg?s=120&d=mm&r=g)
Over the weekend I was working on adding POP3 support to the messaging library I'm working on. I noticed that while downloading large messages from a POP3 server on my local network my test app would use 100% CPU, and would stop responding to events until it finished downloading the message. Looking at the twisted.protocols.pop3.POP3Client code I saw that it was handling downloaded messages by concatenating a string, adding one line at a time. This seemed like an inefficient way to do it, so I modified the code to use a file-like object instead. The result is much faster, and gives you the option of writing a downloading message to a file as it comes in rather than keeping it all in memory. The modified POP3Client (attached) still has some work to be done, but I was hoping to get some comments from the list first. Currently it works like this: When you send a command that returns a multi-line response (LIST or RETR), you pass an optional file-like object that the response should be written to. If no file argument is supplied a StringIO is used. When the server is finished sending the response, handle_COMMANDNAME is called, passing back the file object containing the downloaded response. At this point I'm still not correcting byte-stuffed lines in downloaded messages, so I need to do that. Also, if it would be OK with everyone, I think it might be nice to make POP3Client a little higher-level, so that LIST, for example, would return an actual list instead of a file (or, in the current implementation, a string) that the implementer has to parse. And it might be good to offer support for more POP3 commands like TOP. If somebody could take a look at my code and tell me what I'd need to do to make it meet the Twisted standards I'd appreciate it. (There are also some improvements that could probably be made to the POP3 server code, but I thought those could wait 'till later). Abe
![](https://secure.gravatar.com/avatar/433365de0f787faa3ed3e6dd1da5884f.jpg?s=120&d=mm&r=g)
On 30 Dec 2002 11:27:23 -0500 Abe Fettig <abe@fettig.net> wrote:
The modified POP3Client (attached) still has some work to be done, but I was hoping to get some comments from the list first.
Could you please send patches to the list as diffs, so we can see exactly what you changed?
![](https://secure.gravatar.com/avatar/433365de0f787faa3ed3e6dd1da5884f.jpg?s=120&d=mm&r=g)
On 30 Dec 2002 11:27:23 -0500 Abe Fettig <abe@fettig.net> wrote:
Currently it works like this: When you send a command that returns a multi-line response (LIST or RETR), you pass an optional file-like object that the response should be written to. If no file argument is supplied a StringIO is used. When the server is finished sending the response, handle_COMMANDNAME is called, passing back the file object containing the downloaded response.
Instead of calling handle_COMMANDNAME, why not return a Deferred of the success?
At this point I'm still not correcting byte-stuffed lines in downloaded messages, so I need to do that. Also, if it would be OK with everyone, I think it might be nice to make POP3Client a little higher-level, so that LIST, for example, would return an actual list instead of a file(or, in the current implementation, a string) that the implementer has to parse. And it might be good to offer support for more POP3 commands like TOP.
Or perhaps the Deferred should return that. -- Itamar Shtull-Trauring http://itamarst.org/ Available for Python, Twisted, Zope and Java consulting ***> http://VoteNoWar.org -- vote/donate/volunteer <***
![](https://secure.gravatar.com/avatar/6cd37343ce1da139b1d2ff5ee717f5d5.jpg?s=120&d=mm&r=g)
(diff attached to make it easier to see what I changed) On Mon, 2002-12-30 at 11:48, Itamar Shtull-Trauring wrote:
Currently it works like this: When you send a command that returns a multi-line response (LIST or RETR), you pass an optional file-like object that the response should be written to. If no file argument is supplied a StringIO is used. When the server is finished sending the response, handle_COMMANDNAME is called, passing back the file object containing the downloaded response.
Instead of calling handle_COMMANDNAME, why not return a Deferred of the success?
Based on other Twisted protocol handlers I thought that the preferred way to handle events in protocols was to create methods that get called in response to events. That way you create a class that inherits from POP3Client, and override the methods for events you want to handle. Being able use classes and inheritance this way is one of the things I really like about Twisted. Also, that's the way it was in the original POP3 module.
At this point I'm still not correcting byte-stuffed lines in downloaded messages, so I need to do that. Also, if it would be OK with everyone, I think it might be nice to make POP3Client a little higher-level, so that LIST, for example, would return an actual list instead of a file(or, in the current implementation, a string) that the implementer has to parse. And it might be good to offer support for more POP3 commands like TOP.
Or perhaps the Deferred should return that.
![](https://secure.gravatar.com/avatar/b3407ff6ccd34c6e7c7a9fdcfba67a45.jpg?s=120&d=mm&r=g)
On Mon, Dec 30, 2002 at 01:09:52PM -0500, Abe Fettig wrote:
(diff attached to make it easier to see what I changed)
On Mon, 2002-12-30 at 11:48, Itamar Shtull-Trauring wrote:
Currently it works like this: When you send a command that returns a multi-line response (LIST or RETR), you pass an optional file-like object that the response should be written to. If no file argument is supplied a StringIO is used. When the server is finished sending the response, handle_COMMANDNAME is called, passing back the file object containing the downloaded response.
Instead of calling handle_COMMANDNAME, why not return a Deferred of the success?
Based on other Twisted protocol handlers I thought that the preferred way to handle events in protocols was to create methods that get called in response to events. That way you create a class that inherits from POP3Client, and override the methods for events you want to handle. Being able use classes and inheritance this way is one of the things I really like about Twisted.
It depends; if the event is in response to a request, e.g. LIST, it makes sense to return a Deferred that will receive a list of available messages. FTPClient works this way, and is very convenient -- an FTPClient user can queue commands (like LIST, RETR, etc), and as the server responds, the Deferreds will fire... and there's no confusion if you issue multiple LIST commands in different directories, whereas a handle_LIST callback doesn't have the same advantage. In your case, I'd recommend Deferreds. If you're writing a server that waits for incoming events, then a handle_COMMANDNAME scheme would make good sense.
Also, that's the way it was in the original POP3 module.
Well, if you're breaking backwards-compatibility (are you?), that doesn't really matter ;) -Andrew.
![](https://secure.gravatar.com/avatar/3a7e70f3ef2ad1539da42afc85c8d09d.jpg?s=120&d=mm&r=g)
On Tue, Dec 31, 2002 at 12:57:06PM +1100, Andrew Bennetts wrote:
On Mon, Dec 30, 2002 at 01:09:52PM -0500, Abe Fettig wrote:
On Mon, 2002-12-30 at 11:48, Itamar Shtull-Trauring wrote:
supplied a StringIO is used. When the server is finished sending the response, handle_COMMANDNAME is called, passing back the file object containing the downloaded response.
Instead of calling handle_COMMANDNAME, why not return a Deferred of the success?
Based on other Twisted protocol handlers I thought that the preferred way to handle events in protocols was to create methods that get called in response to events. That way you create a class that inherits from POP3Client, and override the methods for events you want to handle. Being able use classes and inheritance this way is one of the things I really like about Twisted.
It depends; if the event is in response to a request, e.g. LIST, it makes sense to return a Deferred that will receive a list of available messages. FTPClient works this way, and is very convenient -- an FTPClient user can queue commands (like LIST, RETR, etc), and as the server responds, the Deferreds will fire... and there's no confusion if you issue multiple LIST commands in different directories, whereas a handle_LIST callback doesn't have the same advantage.
In your case, I'd recommend Deferreds.
If you're writing a server that waits for incoming events, then a handle_COMMANDNAME scheme would make good sense.
Nah, handle_FOO is useful outside of just servers (see IRCClient, which I'll mention again in a second). It may be good to separate it into two layers, the higher-level one returning Deferreds, which are triggered from handle_FOO methods. This is especially useful if you may have to wait for many low-level protocol messages before you have the full response, but the handle_FOO idiom is nice, in general, as a way of breaking up your protocol parsing into separate methods. The one other place where I thought this might be useful is in IRCClient. IRC commands often result in several messages in response -- something like list, or whois, for example. If we wanted to abstract this interface, we could have a "list" method that returned a Deferred that didn't trigger until *all* the list lines have been received. So, as an _implementation_ strategy, the "handle_FOO" methods are great when the mapping of message-level request to message-level response is 1:many, and for interfaces, Deferreds are great. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://twistedmatrix.com/users/radix.twistd/
![](https://secure.gravatar.com/avatar/b3407ff6ccd34c6e7c7a9fdcfba67a45.jpg?s=120&d=mm&r=g)
On Tue, Dec 31, 2002 at 12:47:24AM -0500, Christopher Armstrong wrote:
The one other place where I thought this might be useful is in IRCClient. IRC commands often result in several messages in response -- something like list, or whois, for example. If we wanted to abstract this interface, we could have a "list" method that returned a Deferred that didn't trigger until *all* the list lines have been received.
Actually, I just realised, I fibbed about FTPClient... it does something a little unusual, and I'm interested in opinions on this. Say you call FTPClient.list(). You don't get a Deferred, you actually have to pass in a Protocol *instance*, and as the data for FTPClient.list is received, it gets passed straight to that Protocol, via dataReceived. In fact, the FTPClient internally makes a Factory for that connection, which (essentially) uses that Protocol, but that's an implementation detail. This is a bit weird, though, because using an FTPClient involves creating Protocols without associated Factories, and nowhere else in Twisted does that. [The reason why I got confused about Deferreds in the interface, is that I use Deferreds internally in FTPClient to manage all this.] So this is actually another interface, possibly specific to FTP, where you pass a Protocol instance -- this allows for progressive processing of data, rather than all-at-once, and this feels "more Twisted" to me :) So, to parse the results of an FTP listing, you pass in a FTPFileListProtocol... this is useful, but there probably should be a higher-level interface to this that returns a Deferred, for when you don't need to do custom parsing and don't care about progressive processing (i.e. most of the time).
So, as an _implementation_ strategy, the "handle_FOO" methods are great when the mapping of message-level request to message-level response is 1:many, and for interfaces, Deferreds are great.
Agreed. In FTP, there is, basically, only one response per request, so I don't have need for handle_FOO in the FTPClient. In POP3, I think the situation is similar (but simpler than FTP). For IRC, handle_FOO makes good sense. -Andrew.
![](https://secure.gravatar.com/avatar/3a7e70f3ef2ad1539da42afc85c8d09d.jpg?s=120&d=mm&r=g)
On Tue, Dec 31, 2002 at 05:16:39PM +1100, Andrew Bennetts wrote:
Actually, I just realised, I fibbed about FTPClient... it does something a little unusual, and I'm interested in opinions on this.
Say you call FTPClient.list(). You don't get a Deferred, you actually have to pass in a Protocol *instance*, and as the data for FTPClient.list is received, it gets passed straight to that Protocol, via dataReceived. In fact, the FTPClient internally makes a Factory for that connection, which (essentially) uses that Protocol, but that's an implementation detail. This is a bit weird, though, because using an FTPClient involves creating Protocols without associated Factories, and nowhere else in Twisted does that.
[The reason why I got confused about Deferreds in the interface, is that I use Deferreds internally in FTPClient to manage all this.]
So this is actually another interface, possibly specific to FTP, where you pass a Protocol instance -- this allows for progressive processing of data, rather than all-at-once, and this feels "more Twisted" to me :)
So, to parse the results of an FTP listing, you pass in a FTPFileListProtocol... this is useful, but there probably should be a higher-level interface to this that returns a Deferred, for when you don't need to do custom parsing and don't care about progressive processing (i.e. most of the time).
Euughhh.. that sounds really horrible at first glance. Are there any cases you can think of that actually make use of passing in different Protocol classes? Also, how much less flexible would it be to simply set the protocols as attributes on the FTPClient if you want to override? (similar to Factory.protocol). The FTPClient would then just use 'self.whateverProtocol'. This agrees much more with what a lot of Twisted already does, I think. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://twistedmatrix.com/users/radix.twistd/
![](https://secure.gravatar.com/avatar/b3407ff6ccd34c6e7c7a9fdcfba67a45.jpg?s=120&d=mm&r=g)
On Tue, Dec 31, 2002 at 01:24:40AM -0500, Christopher Armstrong wrote:
Euughhh.. that sounds really horrible at first glance. Are there any cases you can think of that actually make use of passing in different Protocol classes? Also, how much less flexible would it be to simply set the protocols as attributes on the FTPClient if you want to override? (similar to Factory.protocol). The FTPClient would then just use 'self.whateverProtocol'. This agrees much more with what a lot of Twisted already does, I think.
I do need to use different protocols, or at least I did at a previous job... I had a program that downloaded TIFF files and text files. TIFF files needed to be written to disk and processed, text files were written to disk for posterity, but were processed as they came in. Also, an FTPClient can have multiple commands queued (It seems reasonable to be able to do f.cd('foo/bar'); f.retr('file1'); f.retr('file2')), so an instance attribute isn't a sane option. Any other ideas, though? I would like to make it cleaner. -Andrew.
![](https://secure.gravatar.com/avatar/b932b1e5a3e8299878e579f51f49b84a.jpg?s=120&d=mm&r=g)
On Tuesday, Dec 31, 2002, at 01:40 America/New_York, Andrew Bennetts wrote:
I do need to use different protocols, or at least I did at a previous job... I had a program that downloaded TIFF files and text files. TIFF files needed to be written to disk and processed, text files were written to disk for posterity, but were processed as they came in.
Also, an FTPClient can have multiple commands queued (It seems reasonable to be able to do f.cd('foo/bar'); f.retr('file1'); f.retr('file2')), so an instance attribute isn't a sane option.
Any other ideas, though? I would like to make it cleaner.
Crazy idea, but maybe some extension to or subclass of Deferred is in order, where you would receive chunks until it's done?
![](https://secure.gravatar.com/avatar/152986af8e990c9c8b61115f298b9cb2.jpg?s=120&d=mm&r=g)
On Tue, Dec 31, 2002 at 05:40:43PM +1100, Andrew Bennetts wrote:
On Tue, Dec 31, 2002 at 01:24:40AM -0500, Christopher Armstrong wrote:
Euughhh.. that sounds really horrible at first glance. Are there any cases you can think of that actually make use of passing in different Protocol classes? Also, how much less flexible would it be to simply set the protocols as attributes on the FTPClient if you want to override? (similar to Factory.protocol). The FTPClient would then just use 'self.whateverProtocol'. This agrees much more with what a lot of Twisted already does, I think.
I do need to use different protocols, or at least I did at a previous job... I had a program that downloaded TIFF files and text files. TIFF files needed to be written to disk and processed, text files were written to disk for posterity, but were processed as they came in.
IIRC, the TIFF files in question were very, very large, while the text files were quite small. TIFF files you'd want to spool to disk as quickly as possible, but text files you could build them up in a buffer as they came in. I imagine the HTTP client would face similar issues - how does it go downloading large files?
Also, an FTPClient can have multiple commands queued (It seems reasonable to be able to do f.cd('foo/bar'); f.retr('file1'); f.retr('file2')), so an instance attribute isn't a sane option.
Any other ideas, though? I would like to make it cleaner.
I think it would be nice if Twisted had a DownloadingFile interface or something - make a new DownloadingFile instance, then pass it to FTP's RETR, HTTP's GET, an IRC DCC transfer, a twisted.conch scp session, or whatever. Then you can get the DownloadingFile instance to trigger deferreds when it's complete, when it's progress is updated, that sort of thing. It would make it easy to generate stats like download speed, estimated time, and so forth. Make a subclass DownloadingTIFFFile that saves the file to disk and processes it, make a subclass DownloadingTextFile that processes the file as it comes in, then saves it to disk. Screwtape, ...yes, another crazy idea I don't particulary want to implement myself. :) -- ___________ ____________________________ | Screwtape | Reply-To: munged on Usenet |________ ______ ____ __ _ _ _ | | "The cruel angel's thesis bleeds" |
![](https://secure.gravatar.com/avatar/78e124bb69bcaac7a75ecfe640d02261.jpg?s=120&d=mm&r=g)
On Tue, 31 Dec 2002, screwtape@froup.com wrote:
I imagine the HTTP client would face similar issues - how does it go downloading large files?
One subclass saves files to filesystem (and calls a callback when done), one subclass adds them up in a buffer and calls a callback when done. Other suclasses are possible. For more details, see twisted.web.client
![](https://secure.gravatar.com/avatar/0b90087ed4aef703541f1cafdb4b49a1.jpg?s=120&d=mm&r=g)
On Tue, Dec 31, 2002 at 06:08:22PM +1100, screwtape@froup.com wrote:
I think it would be nice if Twisted had a DownloadingFile interface or something - make a new DownloadingFile instance, then pass it to FTP's RETR, HTTP's GET, an IRC DCC transfer, a twisted.conch scp session, or whatever. Then you can get the DownloadingFile instance to trigger deferreds when it's complete, when it's progress is updated, that sort of thing. It would make it easy to generate stats like download speed, estimated time, and so forth.
How does that differ from the normal file interface? f.write(somestuff) f.write(somestuff) f.write(somestuff) f.close()
![](https://secure.gravatar.com/avatar/6cd37343ce1da139b1d2ff5ee717f5d5.jpg?s=120&d=mm&r=g)
On Tue, 2002-12-31 at 07:53, Tommi Virtanen wrote:
On Tue, Dec 31, 2002 at 06:08:22PM +1100, screwtape@froup.com wrote:
I think it would be nice if Twisted had a DownloadingFile interface or something - make a new DownloadingFile instance, then pass it to FTP's RETR, HTTP's GET, an IRC DCC transfer, a twisted.conch scp session, or whatever. Then you can get the DownloadingFile instance to trigger deferreds when it's complete, when it's progress is updated, that sort of thing. It would make it easy to generate stats like download speed, estimated time, and so forth.
How does that differ from the normal file interface?
f.write(somestuff) f.write(somestuff) f.write(somestuff) f.close()
This would be my question too. Is there really a need to use a new interface, or can I just stick with file-like objects? Abe
![](https://secure.gravatar.com/avatar/0b90087ed4aef703541f1cafdb4b49a1.jpg?s=120&d=mm&r=g)
On Tue, Dec 31, 2002 at 09:28:01AM -0500, Abe Fettig wrote:
How does that differ from the normal file interface?
f.write(somestuff) f.write(somestuff) f.write(somestuff) f.close()
This would be my question too. Is there really a need to use a new interface, or can I just stick with file-like objects?
Stick with them until someone proves you can't. -- :(){ :|:&};:
![](https://secure.gravatar.com/avatar/3477a9de290ec6d77129af504faa1c0b.jpg?s=120&d=mm&r=g)
On Tue, 31 Dec 2002, Tommi Virtanen <tv@twistedmatrix.com> wrote:
How does that differ from the normal file interface?
Stick with them until someone proves you can't.
\begin{proof} What if there's an error? Like, the connection gets lost prematurely (in the middle of a line, or something), or the connection never starts? \end{proof}
![](https://secure.gravatar.com/avatar/6cd37343ce1da139b1d2ff5ee717f5d5.jpg?s=120&d=mm&r=g)
On Wed, 2003-01-01 at 01:22, Moshe Zadka wrote:
On Tue, 31 Dec 2002, Tommi Virtanen <tv@twistedmatrix.com> wrote:
How does that differ from the normal file interface?
Stick with them until someone proves you can't.
\begin{proof} What if there's an error? Like, the connection gets lost prematurely (in the middle of a line, or something), or the connection never starts? \end{proof}
I'm still pretty newbie-ish so sorry if this is a stupid question. But wouldn't lost connections, etc get handled on the Factory level? Abe
![](https://secure.gravatar.com/avatar/152986af8e990c9c8b61115f298b9cb2.jpg?s=120&d=mm&r=g)
On Tue, Dec 31, 2002 at 02:53:43PM +0200, Tommi Virtanen wrote:
On Tue, Dec 31, 2002 at 06:08:22PM +1100, screwtape@froup.com wrote:
I think it would be nice if Twisted had a DownloadingFile interface or something - make a new DownloadingFile instance, then pass it to FTP's RETR, HTTP's GET, an IRC DCC transfer, a twisted.conch scp session, or whatever. Then you can get the DownloadingFile instance to trigger deferreds when it's complete, when it's progress is updated, that sort of thing. It would make it easy to generate stats like download speed, estimated time, and so forth.
How does that differ from the normal file interface?
f.write(somestuff) f.write(somestuff) f.write(somestuff) f.close()
Because I still think of a file as a type, not an object. *sigh* You're right, file objects would be good. At any rate using Protocols would seem like overkill. -- ___________ ____________________________ | Screwtape | Reply-To: munged on Usenet |________ ______ ____ __ _ _ _ | | "Universe-flavored. What you mortals call `butterscotch'." -- T&R, 1999-09-17 |
![](https://secure.gravatar.com/avatar/6cd37343ce1da139b1d2ff5ee717f5d5.jpg?s=120&d=mm&r=g)
On Tue, 2002-12-31 at 00:47, Christopher Armstrong wrote:
On Tue, Dec 31, 2002 at 12:57:06PM +1100, Andrew Bennetts wrote:
It depends; if the event is in response to a request, e.g. LIST, it makes sense to return a Deferred that will receive a list of available messages. FTPClient works this way, and is very convenient -- an FTPClient user can queue commands (like LIST, RETR, etc), and as the server responds, the Deferreds will fire... and there's no confusion if you issue multiple LIST commands in different directories, whereas a handle_LIST callback doesn't have the same advantage.
In your case, I'd recommend Deferreds.
So, as an _implementation_ strategy, the "handle_FOO" methods are great when the mapping of message-level request to message-level response is 1:many, and for interfaces, Deferreds are great.
Makes sense to me. In this case I'll use Deferreds. Abe
![](https://secure.gravatar.com/avatar/b3407ff6ccd34c6e7c7a9fdcfba67a45.jpg?s=120&d=mm&r=g)
On Mon, Dec 30, 2002 at 11:27:23AM -0500, Abe Fettig wrote:
Over the weekend I was working on adding POP3 support to the messaging library I'm working on. I noticed that while downloading large messages from a POP3 server on my local network my test app would use 100% CPU, and would stop responding to events until it finished downloading the message.
Looking at the twisted.protocols.pop3.POP3Client code I saw that it was handling downloaded messages by concatenating a string, adding one line at a time. This seemed like an inefficient way to do it, so I modified the code to use a file-like object instead. The result is much faster, and gives you the option of writing a downloading message to a file as it comes in rather than keeping it all in memory.
Hang on... POP3Client *doesn't* do any concatenation. Looking at it, it merely fires handle_COMMAND, possibly followed by multiple handle_COMMAND_continue and a handle_COMMAND_end, and at no time is more than one line passed around. Perhaps you mean basic.LineReceiver does concatenation? How long are the lines you are receiving? If basic.LineReceiver is the culprit, I suspect it would make more sense to fix it there than in POP3Client. -Andrew.
participants (9)
-
Abe Fettig
-
Andrew Bennetts
-
Bob Ippolito
-
Christopher Armstrong
-
Itamar Shtull-Trauring
-
Moshe Zadka
-
Moshe Zadka
-
screwtape@froup.com
-
Tommi Virtanen