[Twisted-Python] Considering Twisted for OfflineIMAP
![](https://secure.gravatar.com/avatar/090740822c9dcdb39ffe506b890981b4.jpg?s=120&d=mm&r=g)
Hello, I am the author of OfflineIMAP, a multi-threaded multi-account bi-directional synchronization tool (http://quux.org/devel/offlineimap) written in Python. I am running up against several limitations of the current way I'm doing things. They include: 1. Lack of thread groups in Python, meaning that if there was a network error synchronizing account A, I'd also have to kill off the entire application and the synchronizing threads for account B -- I couldn't just kill of account A. 2. Various bugs in Python's threading implementation 3. imaplib.py being a stinking festering heap of crap leading to unmaintainable code in everything that touches it My first plan was to just rewrite imaplib, which wouldn't be too hard. But someone on the OfflineIMAP list pointed me to Twisted, which I observe already contains an IMAP library that looks nice, plus a different paradigm that may spare me from the whole thread groups question altogether. My questions, then, are these: 1. How does Twisted interact with things like threads and connection pooling? I currently let the user control the maximum number of connections that are open to a given server, and whenever the code needs to access an IMAP server, it grabs a connection from the appropriate pool and uses it (using threading primitives to block if no connection is available.) Can Twisted handle multiple connections to multiple servers doing multiple things at once? Can it do connection pools? If it needs threading to do these things, does it play nice with it? 3. I notice the "Deferred" object and the benefits it provides when a server response is expected to take some time. I assume that some sort of internal threading is taking place here? What if the server *request* takes some time -- say, uploading a 2MB e-mail over a dialup link. Are we still OK with handling that in a non-blocking fashion? 4. The howto "book" alludes to pending improvements on the mail infrastructure. Anything I should be aware of here? 5. Is there any sort of unified mail API in Twisted (like JavaMail) that would present me with a single API to both IMAP and Maildir repositories, or is that something I need to do on my own? (I've already done it, so it's no big deal to do that again) 6. OfflineIMAP supports several different user interfaces (two written with Tkinter, 1 using Curses, and three plain console ones.) I would like to reimplement the Tkinter ones with wxPython, and am glad to see that Twisted works well with this. One concern, though, is that the users can supply a list of UIs to try in the config file: for instance, Tk, Curses, Noninteractive. The system will try each one in turn until it reaches one that works. (For instance, Tk may not work if the user is not in X, and Curses may not work if stdout is not a tty). It looks like this may be problematic with Twisted since the UI selection always seems to be known in advance with the import commands related to the reactor. Any thoughts here? Thanks! -- John Goerzen
![](https://secure.gravatar.com/avatar/b3407ff6ccd34c6e7c7a9fdcfba67a45.jpg?s=120&d=mm&r=g)
On Sat, Jul 26, 2003 at 08:48:15AM -0500, John Goerzen wrote:
Ooh, nifty :) [...]
Twisted uses non-blocking sockets, so it can do an arbitrary amount of connections (both client and server) without any threads. Running multiple clients (or servers) at a time is as easy as calling reactor.connectTCP more than once. [What happened to question 2?]
No threading. Deferreds aren't magical at all. Deferreds are simply an object a function can return to say "I don't have a result ready yet; as soon as I've got it, I'll let you know." The caller can then register callbacks and errbacks that will be called when the result does become available. How the Deferred eventually gets called is up to the code that originally created it; it might be the result of query sent to the network being answered (e.g. twisted.web.client.getPage), a spawned process terminating (e.g. twisted.internet.utils.getProcessOutput), a function call in another thread completing (e.g. twisted.internet.threads.deferToThread), and so on.
Absolutely! Twisted handles large transfers via a consumer/producer interface, though, rather than with Deferreds (which are one-shot results, rather than streaming).
4. The howto "book" alludes to pending improvements on the mail infrastructure. Anything I should be aware of here?
Jp has been doing a lot of great work on this recently -- you should definitely be working with CVS rather than 1.0.6 if you can. I don't know much about the details, though.
I've no idea -- Jp?
I suspect this can be dealt with, but I'm not sure off the top of my head how... hopefully someone else can help you here. -Andrew.
![](https://secure.gravatar.com/avatar/090740822c9dcdb39ffe506b890981b4.jpg?s=120&d=mm&r=g)
Andrew Bennetts <andrew-twisted@puzzling.org> writes:
Ooh, nifty :)
Thanks :-)
Well, yes and no, I think. Let me elaborate. OfflineIMAP always behaves as an IMAP client, so I'll be using twisted.protocols.imap4.IMAP4Client. This class is written to use one and exactly one connection to the server (it must, since it takes a single SSL contextFactory as an arg to __init__). OfflineIMAP would like to do, say, 4 operations at once. For the sake of example, let's say these four are: 1. Getting a list of messages in folder A 2. Downloading a message from folder B 3. Uploading a message to folder B 4. Deleting a message in folder C Now, the way I'd handle this pre-twisted is that my own IMAP folder class would basically do this: imap = pool.acquireconnection() try: # do the stuff here # imap.select() to pick the relevant folder # imap.download() or whatnot finally: pool.releaseconnection(imap) This was running with multiple threads, so even though the stuff in the "try" was blocking, it didn't matter. With Twisted, I could clone this setup somewhat by making sure the last callback and errback in my deferreds from IMAP4Client release the connection, that's ugly and error-prone. I'd imagine that downloading a message would basically go like this: d = pool.acquireconnection() d.addCallback(select) d.addCallback(download) d.addCallback(releaseconnection) d.addCallback(returnresults) But this doesn't take into account errors, and it rather annoying to boot. So my question is: just how might I accomplish this in a nice way with Twisted?
[What happened to question 2?]
I found the answer to it before I posted the message, so I deleted it but forgot to renumber.
(e.g. twisted.internet.utils.getProcessOutput), a function call in another thread completing (e.g. twisted.internet.threads.deferToThread), and so on.
Got it.
Do the online API docs refer to the CVS version or 1.0.6? It's 1.0.6 that I've got right now.
I suspect this can be dealt with, but I'm not sure off the top of my head how... hopefully someone else can help you here.
One thought I had was to wait to import the reactor until I know what kind of UI I want. That works OK as long as I continue using ConfigParser for my configuration. If at some point I switch to using Twisted's config storage (I know it's there but I haven't looked at it yet), I suspect that could introduce a chicken-and-egg problem. -- John
![](https://secure.gravatar.com/avatar/3a7e70f3ef2ad1539da42afc85c8d09d.jpg?s=120&d=mm&r=g)
On Sat, Jul 26, 2003 at 11:40:20AM -0500, John Goerzen wrote:
No, Twisted is not a monolithic import, that would be insane. You can import twisted.python.usage before importing the reactor. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://twistedmatrix.com/users/radix.twistd/
![](https://secure.gravatar.com/avatar/d7875f8cfd8ba9262bfff2bf6f6f9b35.jpg?s=120&d=mm&r=g)
On Sat, 26 Jul 2003 11:40:20 -0500 John Goerzen <jgoerzen@complete.org> wrote:
There's an addErrback. If you read the Deferred docs, you'll eventually figure out how Deferreds match Python's exception handling mechanism almost exactly, just with asynchronous APIs.
You can always stick to ConfigParser, and for your needs it may be better. -- Itamar Shtull-Trauring http://itamarst.org/ http://www.zoteca.com -- Python & Twisted consulting
![](https://secure.gravatar.com/avatar/d7875f8cfd8ba9262bfff2bf6f6f9b35.jpg?s=120&d=mm&r=g)
On Sat, 26 Jul 2003 11:40:20 -0500 John Goerzen <jgoerzen@complete.org> wrote:
Do the online API docs refer to the CVS version or 1.0.6? It's 1.0.6 that I've got right now.
1.0.6. I recommed getting CVS and reading the code for IMAP. And the tests! -- Itamar Shtull-Trauring http://itamarst.org/ http://www.zoteca.com -- Python & Twisted consulting
![](https://secure.gravatar.com/avatar/0b90087ed4aef703541f1cafdb4b49a1.jpg?s=120&d=mm&r=g)
On Sat, Jul 26, 2003 at 08:48:15AM -0500, John Goerzen wrote:
Oooh! I'm very motivated to helping you; I've used offlineimap for a long time now, and I think I grok Twisted reasonably well. We're on IRC at #twisted on freenode. I'd be happy to help you while you are getting to know twisted. Andrew gave pretty good answers already, so I'll keep my answers short.
1. How does Twisted interact with things like threads and connection pooling?
You really don't want to use threads; Twisted gives you everything you need to implement offlineimap asynchronously. (There is e.g. deferToThread, used when interfacing to e.g. database APIs that block, but that really should not be used if possible.)
That should be pretty doable, not with thread pooling but by just counting the currently open IMAP connections and when the number is below the limit, taking more work from a queue and starting to process it (asynchronously).
4. The howto "book" alludes to pending improvements on the mail infrastructure. Anything I should be aware of here?
I believe much of that has already happened.
Well, see twisted.protocols.imap4.IMailbox for _something_.
Hmm. It seems you'd need to read the configuration before starting the reactor (event loop). That should be enough. -- :(){ :|:&};:
![](https://secure.gravatar.com/avatar/6f6f30040a0a4dbb6032ea4acdffbd6b.jpg?s=120&d=mm&r=g)
Hi, Tommi Virtanen wrote:
counting the currently open IMAP connections and when the number is below the limit, taking more work from a queue
Or you could use a semaphore. Feel free to add this code to twisted.internet.deferred. The interface should be identical to python.threading.Semaphore, except that I was too lazy to add any verbose statements... class Semaphore(object): def __init__(self, value=1, verbose=None): self.queue=[] self.value=value def acquire(self): d=Deferred() if self.value: self.value -= 1 d.callback(False) else: self.queue.append(d) return d def release(self): if self.queue: self.queue.pop(0).callback(True) else: self.value += 1 The callback's parameter answers the "did I have to wait for a slot" question, in case the caller needs that question answered. (Most don't.) -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de -- s = (char*)(long)retval; /* ouch */ -- Larry Wall in doio.c from the perl source code
![](https://secure.gravatar.com/avatar/b3407ff6ccd34c6e7c7a9fdcfba67a45.jpg?s=120&d=mm&r=g)
On Sat, Jul 26, 2003 at 08:48:15AM -0500, John Goerzen wrote:
Ooh, nifty :) [...]
Twisted uses non-blocking sockets, so it can do an arbitrary amount of connections (both client and server) without any threads. Running multiple clients (or servers) at a time is as easy as calling reactor.connectTCP more than once. [What happened to question 2?]
No threading. Deferreds aren't magical at all. Deferreds are simply an object a function can return to say "I don't have a result ready yet; as soon as I've got it, I'll let you know." The caller can then register callbacks and errbacks that will be called when the result does become available. How the Deferred eventually gets called is up to the code that originally created it; it might be the result of query sent to the network being answered (e.g. twisted.web.client.getPage), a spawned process terminating (e.g. twisted.internet.utils.getProcessOutput), a function call in another thread completing (e.g. twisted.internet.threads.deferToThread), and so on.
Absolutely! Twisted handles large transfers via a consumer/producer interface, though, rather than with Deferreds (which are one-shot results, rather than streaming).
4. The howto "book" alludes to pending improvements on the mail infrastructure. Anything I should be aware of here?
Jp has been doing a lot of great work on this recently -- you should definitely be working with CVS rather than 1.0.6 if you can. I don't know much about the details, though.
I've no idea -- Jp?
I suspect this can be dealt with, but I'm not sure off the top of my head how... hopefully someone else can help you here. -Andrew.
![](https://secure.gravatar.com/avatar/090740822c9dcdb39ffe506b890981b4.jpg?s=120&d=mm&r=g)
Andrew Bennetts <andrew-twisted@puzzling.org> writes:
Ooh, nifty :)
Thanks :-)
Well, yes and no, I think. Let me elaborate. OfflineIMAP always behaves as an IMAP client, so I'll be using twisted.protocols.imap4.IMAP4Client. This class is written to use one and exactly one connection to the server (it must, since it takes a single SSL contextFactory as an arg to __init__). OfflineIMAP would like to do, say, 4 operations at once. For the sake of example, let's say these four are: 1. Getting a list of messages in folder A 2. Downloading a message from folder B 3. Uploading a message to folder B 4. Deleting a message in folder C Now, the way I'd handle this pre-twisted is that my own IMAP folder class would basically do this: imap = pool.acquireconnection() try: # do the stuff here # imap.select() to pick the relevant folder # imap.download() or whatnot finally: pool.releaseconnection(imap) This was running with multiple threads, so even though the stuff in the "try" was blocking, it didn't matter. With Twisted, I could clone this setup somewhat by making sure the last callback and errback in my deferreds from IMAP4Client release the connection, that's ugly and error-prone. I'd imagine that downloading a message would basically go like this: d = pool.acquireconnection() d.addCallback(select) d.addCallback(download) d.addCallback(releaseconnection) d.addCallback(returnresults) But this doesn't take into account errors, and it rather annoying to boot. So my question is: just how might I accomplish this in a nice way with Twisted?
[What happened to question 2?]
I found the answer to it before I posted the message, so I deleted it but forgot to renumber.
(e.g. twisted.internet.utils.getProcessOutput), a function call in another thread completing (e.g. twisted.internet.threads.deferToThread), and so on.
Got it.
Do the online API docs refer to the CVS version or 1.0.6? It's 1.0.6 that I've got right now.
I suspect this can be dealt with, but I'm not sure off the top of my head how... hopefully someone else can help you here.
One thought I had was to wait to import the reactor until I know what kind of UI I want. That works OK as long as I continue using ConfigParser for my configuration. If at some point I switch to using Twisted's config storage (I know it's there but I haven't looked at it yet), I suspect that could introduce a chicken-and-egg problem. -- John
![](https://secure.gravatar.com/avatar/3a7e70f3ef2ad1539da42afc85c8d09d.jpg?s=120&d=mm&r=g)
On Sat, Jul 26, 2003 at 11:40:20AM -0500, John Goerzen wrote:
No, Twisted is not a monolithic import, that would be insane. You can import twisted.python.usage before importing the reactor. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://twistedmatrix.com/users/radix.twistd/
![](https://secure.gravatar.com/avatar/d7875f8cfd8ba9262bfff2bf6f6f9b35.jpg?s=120&d=mm&r=g)
On Sat, 26 Jul 2003 11:40:20 -0500 John Goerzen <jgoerzen@complete.org> wrote:
There's an addErrback. If you read the Deferred docs, you'll eventually figure out how Deferreds match Python's exception handling mechanism almost exactly, just with asynchronous APIs.
You can always stick to ConfigParser, and for your needs it may be better. -- Itamar Shtull-Trauring http://itamarst.org/ http://www.zoteca.com -- Python & Twisted consulting
![](https://secure.gravatar.com/avatar/d7875f8cfd8ba9262bfff2bf6f6f9b35.jpg?s=120&d=mm&r=g)
On Sat, 26 Jul 2003 11:40:20 -0500 John Goerzen <jgoerzen@complete.org> wrote:
Do the online API docs refer to the CVS version or 1.0.6? It's 1.0.6 that I've got right now.
1.0.6. I recommed getting CVS and reading the code for IMAP. And the tests! -- Itamar Shtull-Trauring http://itamarst.org/ http://www.zoteca.com -- Python & Twisted consulting
![](https://secure.gravatar.com/avatar/0b90087ed4aef703541f1cafdb4b49a1.jpg?s=120&d=mm&r=g)
On Sat, Jul 26, 2003 at 08:48:15AM -0500, John Goerzen wrote:
Oooh! I'm very motivated to helping you; I've used offlineimap for a long time now, and I think I grok Twisted reasonably well. We're on IRC at #twisted on freenode. I'd be happy to help you while you are getting to know twisted. Andrew gave pretty good answers already, so I'll keep my answers short.
1. How does Twisted interact with things like threads and connection pooling?
You really don't want to use threads; Twisted gives you everything you need to implement offlineimap asynchronously. (There is e.g. deferToThread, used when interfacing to e.g. database APIs that block, but that really should not be used if possible.)
That should be pretty doable, not with thread pooling but by just counting the currently open IMAP connections and when the number is below the limit, taking more work from a queue and starting to process it (asynchronously).
4. The howto "book" alludes to pending improvements on the mail infrastructure. Anything I should be aware of here?
I believe much of that has already happened.
Well, see twisted.protocols.imap4.IMailbox for _something_.
Hmm. It seems you'd need to read the configuration before starting the reactor (event loop). That should be enough. -- :(){ :|:&};:
![](https://secure.gravatar.com/avatar/6f6f30040a0a4dbb6032ea4acdffbd6b.jpg?s=120&d=mm&r=g)
Hi, Tommi Virtanen wrote:
counting the currently open IMAP connections and when the number is below the limit, taking more work from a queue
Or you could use a semaphore. Feel free to add this code to twisted.internet.deferred. The interface should be identical to python.threading.Semaphore, except that I was too lazy to add any verbose statements... class Semaphore(object): def __init__(self, value=1, verbose=None): self.queue=[] self.value=value def acquire(self): d=Deferred() if self.value: self.value -= 1 d.callback(False) else: self.queue.append(d) return d def release(self): if self.queue: self.queue.pop(0).callback(True) else: self.value += 1 The callback's parameter answers the "did I have to wait for a slot" question, in case the caller needs that question answered. (Most don't.) -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de -- s = (char*)(long)retval; /* ouch */ -- Larry Wall in doio.c from the perl source code
participants (6)
-
Andrew Bennetts
-
Christopher Armstrong
-
Itamar Shtull-Trauring
-
John Goerzen
-
Matthias Urlichs
-
Tommi Virtanen