[Twisted-Python] adbapi, transactions and threading
![](https://secure.gravatar.com/avatar/b767aad6ee69a38bdd7702ad540beb00.jpg?s=120&d=mm&r=g)
I hope it won't be too confusing. I'm basically trying to make sure I've grasped the concept - do correct me on anything I am wrong about or my way of thinking confuses you, please. I'm not used to running blocking calls and deferToThread a lot, so I wanted to make sure I understand that correctly. Basically, the way I understand it - adbapi is just a deferToThread wrapper around the normal python API, correct? As in - if I used something different to access my database, for example - sqlalchemy, I would only need to appropriately wrap that functionality with deferToThread, just as adbapi does? That's one thing. Now - what I'm trying to do in essence is to load some big chunk of data out of the DB, process it, and save it back. I'd use a nice chain of deffered calls - one runQuery, one for the processing, and one runOperation. However, I need transaction functionality, so unless I'm mistaken, my only choice is runInteraction. Since that's automatically ran in a separate thread, I see it as a monolithic piece of code -> query, processing call, saving query -> no deffereds can take place there. Am I wrong on that one? Even if so - it won't be that bad for me, since good part of the processing will be handled by an external library (GIL released - says so, more cores used, makes me happy). That leads me to the next question - when I've got long code to run and it happens to use an external, thread-safe, C library, releasing the GIL, I should probably always take care to defer it to thread, if I wanted to take advantage of multiple cores, correct? Otherwise, I wouldn't have any parallelism gains, which I can get, because of the GIL release. And let's say that my processing code can take a while sometimes. Which leads me to the other question - what should I do in the case where I need to occasionally run big chunks of code. No blocking calls, just crunch something down. Is deferToThread the only solution for that? Is the idea to compose that as big deffered chains so other processing might run normally, instead of wait for the big function to exit? Because deferToThread will only get me anything, if there's a blocking call inside, or if I mange to get parallelism out of it, if it's something handled by GIL-released code. Sorry if I sound too confusing, I'm trying to wrap it all in my head before I dive in handling the service.
![](https://secure.gravatar.com/avatar/182974f8b2562287a54415119be4535c.jpg?s=120&d=mm&r=g)
Atilla wrote:
Basically, the way I understand it - adbapi is just a deferToThread wrapper around the normal python API, correct? As in - if I used something different to access my database, for example - sqlalchemy, I would only need to appropriately wrap that functionality with deferToThread, just as adbapi does?
If a twisted integration for sqlalchemy is specifically what you want, that is already available: http://foss.eepatents.com/sAsync/ (or do a pypi search on 'sasync'). Steve
![](https://secure.gravatar.com/avatar/7ed9784cbb1ba1ef75454034b3a8e6a1.jpg?s=120&d=mm&r=g)
On Fri, 4 Apr 2008 17:01:40 +0200, Atilla <theatilla@gmail.com> wrote:
I hope it won't be too confusing. I'm basically trying to make sure I've grasped the concept - do correct me on anything I am wrong about or my way of thinking confuses you, please.
I'm not used to running blocking calls and deferToThread a lot, so I wanted to make sure I understand that correctly.
Basically, the way I understand it - adbapi is just a deferToThread wrapper around the normal python API, correct? As in - if I used something different to access my database, for example - sqlalchemy, I would only need to appropriately wrap that functionality with deferToThread, just as adbapi does?
Yes.
That's one thing. Now - what I'm trying to do in essence is to load some big chunk of data out of the DB, process it, and save it back. I'd use a nice chain of deffered calls - one runQuery, one for the processing, and one runOperation. However, I need transaction functionality, so unless I'm mistaken, my only choice is runInteraction.
Right.
Since that's automatically ran in a separate thread, I see it as a monolithic piece of code -> query, processing call, saving query -> no deffereds can take place there. Am I wrong on that one? Even if so - it won't be that bad for me, since good part of the processing will be handled by an external library (GIL released - says so, more cores used, makes me happy).
Correct.
That leads me to the next question - when I've got long code to run and it happens to use an external, thread-safe, C library, releasing the GIL, I should probably always take care to defer it to thread, if I wanted to take advantage of multiple cores, correct? Otherwise, I wouldn't have any parallelism gains, which I can get, because of the GIL release. And let's say that my processing code can take a while sometimes.
Yep.
Which leads me to the other question - what should I do in the case where I need to occasionally run big chunks of code. No blocking calls, just crunch something down. Is deferToThread the only solution for that? Is the idea to compose that as big deffered chains so other processing might run normally, instead of wait for the big function to exit? Because deferToThread will only get me anything, if there's a blocking call inside, or if I mange to get parallelism out of it, if it's something handled by GIL-released code.
deferToThread is one solution (you can use processes instead of threads, but that's roughly the same idea). Deferred aren't sensible for CPU-bound tasks. They just make the implementation slower and more complex, and they probably _don't_ allow other tasks to run, since a Deferred is just a way to track results, it doesn't imply any special scheduling. This means the different chunks of your computation will still run all at once and block other tasks from running unless you explicitly insert scheduling logic. If that is interesting, then twisted.internet.task.coiterate may be interesting. However, having a thread-safe CPU-bound task (preferably one which is all self-contained and doesn't need to talk to other APIs, certainly not Twisted APIs) and running it in a thread with deferToThread is sensible.
Sorry if I sound too confusing, I'm trying to wrap it all in my head before I dive in handling the service.
Not very confusing at all. :) Jean-Paul
![](https://secure.gravatar.com/avatar/b767aad6ee69a38bdd7702ad540beb00.jpg?s=120&d=mm&r=g)
Thanks a lot for your comments. I'm starting work on my service today and looks like i've got everything cleared up. Let's hope it'll work just as well as I think it should. Yes, I remember coiterate from some questions I asked a while back. However in this case I think I'll rely on deferToThread and the external processing and see how that's going to go. I will probabbly not run into performance issues for a while. I'll take a look at the sqlalchemy thing as well, but maybe I'll just go for simple adbapi, since I need to only run 2-3 simple queries. Cheers.
participants (3)
-
Atilla
-
Jean-Paul Calderone
-
Stephen Waterbury