Some help with Nevow and databse I/O
![](https://secure.gravatar.com/avatar/dc9d35a08c81eab49079ec616bf4cb98.jpg?s=120&d=mm&r=g)
Hello everyone, Twisted is relatively new to me but I've become fascinated by the idea of asynchronous network programming and I generally enjoy the feel of the twisted framework. I'm developing a web application with a limited set of pages that will eventually serve as a bandwidth statistics application (uses SNMP to poll switch counters). I'm using newvow to render some simple XML pages which basically take the form of: def data_listCusotmers(self, context, data): args = inevow.IRequest(context).args db = shelve.open('authdb') return db docFactory = loaders.xmlfile('templates/ListCustomers.html') I have two questions that are somewhat interrelated. In my test cases I have authentication (using nevow session guard) setup using a SHA stored via the shelve module. The true customer info is stored in a MySQL database. I can pull that data doing the following: def data_query(self, context, data): return self.dbpool.runQuery('SELECT name, email, id, TransferType, basic_charge, over_charge, permitted_transfer FROM Customers') I've read Abe's book on twisted and on page 54 he states "Nevow is designed from the ground up for Twisted, which means you can use Deferreds everywhere" My question is do I have to use the adbapi module or can I use deferreds to handle database queries? I've read the docs on deferreds but I'd be lying if I said I can fully wrap my brain around all aspects of the concept or how it's implemented. I don't understand why I would need separate threads in the first place. I know that the query would block but isn't that transaction being done via sockets anyway? Why can't I use the same mechanism (deferred) I would use, to say, defer work waiting on a response from a mail server? Can't I submit a query, have the reactor go do something else, and have a callback run when the data becomes available? Why do I need to spawn threads to avoid blocking? It should be pretty obvious that I don't fully grasp the concept of asynchronous socket programming. Can somebody provide insight that may help me to better understand things. I can live with using the adbapi but wonder if it's possible to simply defer that work and use a callback mechanism (I'm using the original twisted.web not twisted.web2). I've also read Glyf's post where he states "The documentation is really poor, and never says this, but using Twisted, or rather ADBAPI, to convert every single SQL statement into a separate transaction and handle its results separately, has a whole slew of problems". http://glyf.livejournal.com/2005/09/24/ I was pretty much heading down this road but I don't really understand how else to do it. I've browsed many examples and have never really come across one that involves session.guard, newvow, and db access all in the same application. Anyone have any experience they could share? Thanks, -- Eric Gaumer Debian GNU/Linux PPC egaumer@pagecache.org http://egaumer.pagecache.org PGP/GPG Key 0xF15D41E9
![](https://secure.gravatar.com/avatar/10aa56aec887e973085025c6ddfdc46b.jpg?s=120&d=mm&r=g)
On Fri, 24 Feb 2006 10:11:21 -0800, Eric Gaumer <egaumer@pagecache.org> wrote:
Hello everyone,
Hi
I have two questions that are somewhat interrelated. In my test cases I have authentication (using nevow session guard) setup using a SHA stored via the shelve module.
This is technically wrong. Let me explain why: Yes, you are using nevow session guard to have authentication but no, it is not guard that is setup to use a SHA stored via the shelve module. The thing that is setup in that way is twisted.cred. In fact guard is only glue that is used to get credentials from a request and pass them to twisted.cred. Why do I tell you this? Because it means that you don't have to change your application in order to change the source of authentication data. And yet people say that cred/guard is crappy... Oh man. :)
The true customer info is stored in a MySQL database. I can pull that data doing the following:
def data_query(self, context, data): return self.dbpool.runQuery('SELECT name, email, id, TransferType, basic_charge, over_charge, permitted_transfer FROM Customers')
I've read Abe's book on twisted and on page 54 he states "Nevow is designed from the ground up for Twisted, which means you can use Deferreds everywhere"
My question is do I have to use the adbapi module or can I use deferreds to handle database queries?
adbapi module returns deferreds for each of the 3 methods: DBPool.runQuery DBPool.runOperation DBPool.runInteraction
I've read the docs on deferreds but I'd be lying if I said I can fully wrap my brain around all aspects of the concept or how it's implemented.
class Deferred(list): def addCallback(self, fun, *args, **kwargs): self.append((fun, args, kwargs)) def callback(self, initial_value): for fun, args, kwargs in self: initial_value = fun(initial_value, *args, **kwargs) return initial_value Easy isn't it?
I don't understand why I would need separate threads in the first place. I know that the query would block but isn't that transaction being done via sockets anyway? Why can't I use the same mechanism (deferred) I would use, to say, defer work waiting on a response from a mail server? Can't I submit a query, have the reactor go do something else, and have a callback run when the data becomes available? Why do I need to spawn threads to avoid blocking?
deferreds don't make blocking code non-blocking, they are just a different mechanism to register callbacks. Instead of doing: button.handle_event('clicked', list_of_callbacks) you do button.clicked().addCallback(callback1).addCallback(callback2) You need to spawn a thread because there is hardly any database adapter that is written to be async, and those that are written to be async are either buggy or not complete or not really usable, plus having deferreds after every query in python (which doesn't support deferreds natively) makes coding a lot harder (you can deal with some deferreds here and there, but not with 10 deferreds each function). If you want to do N operations at the same time just use runInteraction instead of runQuery (but this doesn't seem to be the case).
I can live with using the adbapi but wonder if it's possible to simply defer that work and use a callback mechanism (I'm using the original twisted.web not twisted.web2).
As I said above, authentication has almost nothing to do with guard and a lot to do with twisted.cred. Cred chapter in Abe's book is one of the best ones in the book thus I suggest you to read it.
I was pretty much heading down this road but I don't really understand how else to do it. I've browsed many examples and have never really come across one that involves session.guard, newvow, and db access all in the same application. Anyone have any experience they could share?
I can give you an example of how to deal with authentication using a real database: http://svn.berlios.de/viewcvs/weever/trunk/src/users/auth.py?view=markup This code uses the 'old' twisted interfaces and is a bit rusty, nevertheless it shows the basic concepts and how to authenticate against a database. I'm working on a pet project these days which is still very early in development ( http://vise.teknico.net/ ) but I'll have some code cleaned up and ready to be showed in not so long and of course it does authentication against a database using SQLAlchemy as an abstraction layer (it is integrated with twisted in an interesting way). -- Valentino Volonghi aka Dialtone Now Running MacOSX 10.4 Blog: http://vvolonghi.blogspot.com New Pet: http://www.stiq.it
![](https://secure.gravatar.com/avatar/dc9d35a08c81eab49079ec616bf4cb98.jpg?s=120&d=mm&r=g)
On Friday 24 February 2006 12:29 pm, Valentino Volonghi aka Dialtone wrote:
I have two questions that are somewhat interrelated. In my test cases I have authentication (using nevow session guard) setup using a SHA stored via the shelve module.
This is technically wrong.
Sorry. I should have clarified a bit. I am authenticating with twisted.cred using a class/method like this: class AuthMech(object): implements(checkers.ICredentialsChecker) credentialInterfaces = (credentials.IUsernamePassword,) def _checkPasswd(self, cipher, password): cipher = base64.decodestring(str(cipher)) salt, hash = cipher[:4], cipher[4:] hash2 = sha.new(salt + password).digest() return hash2 == hash p.registerChecker(pages.AuthMech()) r = guard.SessionWrapper(p)
Let me explain why: Yes, you are using nevow session guard to have authentication but no, it is not guard that is setup to use a SHA stored via the shelve module. The thing that is setup in that way is twisted.cred.
In fact guard is only glue that is used to get credentials from a request and pass them to twisted.cred.
Why do I tell you this? Because it means that you don't have to change your application in order to change the source of authentication data.
And yet people say that cred/guard is crappy... Oh man. :)
Not crappy at all, in fact it's been very easy to get running and seems to work perfectly. I haven't really encountered any problems and my questions are more about doing things the "right" way rather than just getting it working.
The true customer info is stored in a MySQL database. I can pull that data doing the following:
def data_query(self, context, data): return self.dbpool.runQuery('SELECT name, email, id, TransferType, basic_charge, over_charge, permitted_transfer FROM Customers')
I've read Abe's book on twisted and on page 54 he states "Nevow is designed from the ground up for Twisted, which means you can use Deferreds everywhere"
My question is do I have to use the adbapi module or can I use deferreds to handle database queries?
adbapi module returns deferreds for each of the 3 methods:
DBPool.runQuery DBPool.runOperation DBPool.runInteraction
So I have to use the adbapi (threads) regardless? Okay I can handle that but my question now is what's the most efficient way to go about it? Referring to Glyph's blog post, he claims most twisted developers are misusing the adbapi interface. My concern is this, my code thus far looks pretty good. I've reviewed many examples of bits and pieces and generally have a decent feel for what I'm doing. Now that my application "framework" is basically setup (XHTML, templates, authentication, server, etc...) I'm to the point where I need to interface with an existing SQL database to retrieve data to fill in some templates. This sort of method works fine: def data_query(self, context, data): return self.dbpool.runQuery('SELECT name, email, id, TransferType, basic_charge, over_charge, permitted_transfer FROM Customers') But I'm wondering if this the best way to use the adbapi interface. I guess I have questions concerning when should I connect. As it stands I connect in the class constructor like this: class ListCustomers(rend.Page): def __init__(self, user): self.dbpool = adbapi.ConnectionPool(... My gut says this is wrong because now each page class contains a connection object. That really seems to be more of a general Python design issue and I could probably come up with a better solution. But what about queries themselves? Is it bad design to run a query from each data_* method for all data methods in all page objects? I have three methods to work with: DBPool.runQuery DBPool.runOperation DBPool.runInteraction You mentioned runInteraction above. Would it be more wise to provide a general data_* method that did several queries and stored (cached) that data in the Page object itself for later retieval? Each page has to do some database I/O but it would be helpful to not have to do it everytime a page is refreshed or as the user goes back and forth through pages since the data is unlikely to change that quickly. I'm looking for some insight as to how handle this type of scenario from a Twisted/Nevow perspective. It's obvious that database I/O is going to slow things down a bit but there must be a "best case" situation that avoids unnecessary queries.
I've read the docs on deferreds but I'd be lying if I said I can fully wrap my brain around all aspects of the concept or how it's implemented.
class Deferred(list): def addCallback(self, fun, *args, **kwargs): self.append((fun, args, kwargs))
def callback(self, initial_value): for fun, args, kwargs in self: initial_value = fun(initial_value, *args, **kwargs) return initial_value
Easy isn't it?
Yes, it's the "deferreds don't make blocking code non-blocking" part that can get a bit confusing. I've never really had to think about blocking vs. non-blocking calls and truthfully I like it. It forces you to be a little more conscious about what you're doing. I'm really stoked about not using threads because they are such a nightmare to debug.
deferreds don't make blocking code non-blocking, they are just a different mechanism to register callbacks. Instead of doing:
button.handle_event('clicked', list_of_callbacks)
you do
button.clicked().addCallback(callback1).addCallback(callback2)
This chaining is very cool. So what actually makes the code non-blocking? The use of select() sys calls? I guess that's where my hangup is. How can we take a connection to a mail server and defer that result but yet we can't do the same for a database connection? At the OS level, aren't they both sockets that can be watched via select()? My head hurts :-)
You need to spawn a thread because there is hardly any database adapter that is written to be async, and those that are written to be async are either buggy or not complete or not really usable, plus having deferreds after every query in python (which doesn't support deferreds natively) makes coding a lot harder (you can deal with some deferreds here and there, but not with 10 deferreds each function).
When you say adapter your are referring to (in this case) MySQLdb or it's underlying _mysql module or yet even lower down to the C api? If I wanted to implement a limited set of queries, could I write some non-blocking way of doing that and wrap those functions in Deferred objects. A silly question I guess because it's obviously possible. I guess I should ask if it's feasible. Maybe just attempting this as an exercise would give me greater understanding of how Twisted provides non-blocking calls.
If you want to do N operations at the same time just use runInteraction instead of runQuery (but this doesn't seem to be the case).
No it isn't the case but it is wise for me to do runQuery() calls in each data_ method of each page or should I be looking for some way to unify queries and cache that data? Thanks, -- Eric Gaumer Debian GNU/Linux PPC egaumer@pagecache.org http://egaumer.pagecache.org PGP/GPG Key 0xF15D41E9
![](https://secure.gravatar.com/avatar/10aa56aec887e973085025c6ddfdc46b.jpg?s=120&d=mm&r=g)
On Fri, 24 Feb 2006 13:49:01 -0800, Eric Gaumer <egaumer@pagecache.org> wrote:
So I have to use the adbapi (threads) regardless? Okay I can handle that but my question now is what's the most efficient way to go about it?
Have one connection pool for the whole application and run all queries through it. What I suggest is always using runInteraction because it allows you to write different objects whose methods can be passed to runInteraction to get the expected result (of course methods can be substituted with functions). How to share the connection pool is something that I'll deal with later.
Referring to Glyph's blog post, he claims most twisted developers are misusing the adbapi interface.
I don't know the real meaning, but I suggest you should always stick to runInteraction.
My concern is this, my code thus far looks pretty good. I've reviewed many examples of bits and pieces and generally have a decent feel for what I'm doing. Now that my application "framework" is basically setup (XHTML, templates, authentication, server, etc...) I'm to the point where I need to interface with an existing SQL database to retrieve data to fill in some templates.
This sort of method works fine:
def data_query(self, context, data): return self.dbpool.runQuery('SELECT name, email, id, TransferType, basic_charge, over_charge, permitted_transfer FROM Customers')
But I'm wondering if this the best way to use the adbapi interface.
Yes and no. It is not the best way because of course you are exposing to the application a detail of the way to access data. The best way would be to move that query in a single place with all the other ones so that you will be able to change the source of information easily another time (and when needed).
I guess I have questions concerning when should I connect. As it stands I connect in the class constructor like this:
class ListCustomers(rend.Page): def __init__(self, user): self.dbpool = adbapi.ConnectionPool(...
My gut says this is wrong because now each page class contains a connection object. That really seems to be more of a general Python design issue and I could probably come up with a better solution.
Indeed. The better solution is to put the dbpool in the user object that you pass to ListCustomers :).
But what about queries themselves? Is it bad design to run a query from each data_* method for all data methods in all page objects? I have three methods to work with:
No it is not bad design, it is the only thing you can do actually, data methods where designed with that particular purpose in mind. Of course the other solution is to put everything in beforeRender but I don't like it that much.
DBPool.runQuery DBPool.runOperation DBPool.runInteraction
You mentioned runInteraction above. Would it be more wise to provide a general data_* method that did several queries and stored (cached) that data in the Page object itself for later retieval?
This is another solution, but it's poor on code reuse. In fact you might have just a few data methods that retrieve particular objects and you can inherit those methods in other pages to always have a way to retrieve those objects from the database. Nevow development is mostly creating an environment where templates can get the data they need. Very few python lines are needed when you are using Nevow in the right way because you fill find yourself reusing pre-defined render and data methods many many many times. In fact the biggest application I've written so far has more than twice as much xHTML than Python. And it is made of about 70 kinds of pages for a staggering 10 lines per page on average, not too bad huh? :)
Each page has to do some database I/O but it would be helpful to not have to do it everytime a page is refreshed or as the user goes back and forth through pages since the data is unlikely to change that quickly.
This is a different aspect and might be part of the database layer (query caching). Putting this duty on the application itself is asking for troubles.
I'm looking for some insight as to how handle this type of scenario from a Twisted/Nevow perspective. It's obvious that database I/O is going to slow things down a bit but there must be a "best case" situation that avoids unnecessary queries.
As I said this is really database layer dependent. You don't want your application to do this job because it's simply the wrong layer, it will cost you more lines and more debugging and it will be harder to reuse when you add new pages. A better approach is to use caching at the HTML level, there are different approaches on this and you can find both searching the mailing list or the Nevow bug tracker for the word cache.
button.handle_event('clicked', list_of_callbacks)
you do
button.clicked().addCallback(callback1).addCallback(callback2)
This chaining is very cool. So what actually makes the code non-blocking? The use of select() sys calls? I guess that's where my hangup is. How can we take a connection to a mail server and defer that result but yet we can't do the same for a database connection? At the OS level, aren't they both sockets that can be watched via select()?
My head hurts :-)
Ok, if you ever programmed a custom protocol in Twisted you would get this quite straightly. Basically: Protocols are designed with simple callbacks (no deferreds there) like lineReceived or dataReceived or datagramReceived. This can be achieved thanks to the select() loop and some logic in the base protocol class and some coupling with the transport of course (although pretty light coupling). Of course the next step is to write a state machine on top of these callbacks to deal with complex protocols (like HTTP). The Factory associated with a Protocol might be designed to hold an unfired Deferred instance. Once the Protocol reaches a final state of the state machine it triggers the Deferred held by the Factory. Again of course the Factory exposed some methods like 'getPage' or 'runQuery', those methods returned a reference to the enclosed deferred to the caller in order to allow it to attach the wanted callbacks. You can imagine that when the Protocol fires the deferred all the callbacks attached by the caller will be called and the 'magic' begins. Of course not everything uses Protocols and Factories, for example the DBPool doesn't. Instead it uses a similar way of doing things but conceptually it's the same thing.
When you say adapter your are referring to (in this case) MySQLdb or it's underlying _mysql module or yet even lower down to the C api?
I'm referring to MySQLdb or any other (like psycopg).
If I wanted to implement a limited set of queries, could I write some non-blocking way of doing that and wrap those functions in Deferred objects. A silly question I guess because it's obviously possible. I guess I should ask if it's feasible. Maybe just attempting this as an exercise would give me greater understanding of how Twisted provides non-blocking calls.
There are 3 reactor calls to deal with threads and a 'simplified' way of using them with a thread pool. reactor.callFromThread() reactor.callInThread() reactor.suggestThreadPoolSize() With these 3 APIs you can run a function in a separate thread taken from a thread pool and since the reactor is not thread safe you must use another reactor.API to call other reactor APIs from a separate thread (of course I can't remember which ones). The simplified way is to use threads.deferToThread() which returns a deferred and runs the function that you pass it in a separate thread. This is essentially the same thing that dbpool.runInteraction does.
No it isn't the case but it is wise for me to do runQuery() calls in each data_ method of each page or should I be looking for some way to unify queries and cache that data?
Move runQuery away from it and abstract it out. Then call the abstraction each time in the API and have the abstraction cache stuff when needed. Ok, then I told you that I would have explained how to share the DBPool among the whole application. The Realm has a requestAvatar() method that you have to implement. That method should return an avatar based on an avatarId that is passed to it (the avatarId object is returned by the checker). Given this information you should now understand what an avatar is. The avatar is the interface over the data that a user has access to. How do you use it? Simply by making it the mean through which you access data from the application and passing the avatar throughout the application. For example you can do something like: class ListCustomers(rend.Page): def __init__(self, avatar): self.avatar = avatar rend.Page.__init__(self) def data_customers(self, ctx, data): return self.avatar.get_customers() As you see now there is no implementation leak about the real source of information except that it comes from the avatar, which has many ways to get to the wanted information. There are many ways to deal with the avatar 'construction' in case of permissions over the data to access and roles but this is a whole different topic, neat but pretty hard to explain in this already long email. As I said I have code that deals with this using SQLAlchemy in my new project and as soon as I'll have more time (I'll graduate on 6th March) I'll publish the data layer somewhere, hopefully doing something useful for the community :). -- Valentino Volonghi aka Dialtone Now Running MacOSX 10.4 Blog: http://vvolonghi.blogspot.com New Pet: http://www.stiq.it
![](https://secure.gravatar.com/avatar/dc9d35a08c81eab49079ec616bf4cb98.jpg?s=120&d=mm&r=g)
On Friday 24 February 2006 2:38 pm, Valentino Volonghi aka Dialtone wrote:
There are many ways to deal with the avatar 'construction' in case of permissions over the data to access and roles but this is a whole different topic, neat but pretty hard to explain in this already long email. As I said I have code that deals with this using SQLAlchemy in my new project and as soon as I'll have more time (I'll graduate on 6th March) I'll publish the data layer somewhere, hopefully doing something useful for the community :).
Sweet. Thanks for all the help. You've given me plenty to consider. -- Eric Gaumer Debian GNU/Linux PPC egaumer@pagecache.org http://egaumer.pagecache.org PGP/GPG Key 0xF15D41E9
![](https://secure.gravatar.com/avatar/dc9d35a08c81eab49079ec616bf4cb98.jpg?s=120&d=mm&r=g)
On Friday 24 February 2006 12:29 pm, Valentino Volonghi aka Dialtone wrote:
You need to spawn a thread because there is hardly any database adapter that is written to be async, and those that are written to be async are either buggy or not complete or not really usable, plus having deferreds after every query in python (which doesn't support deferreds natively) makes coding a lot harder (you can deal with some deferreds here and there, but not with 10 deferreds each function).
Valentino, I noticed a post regarding tada with your name on it. Have you tried it and if so can you comment? -- Eric Gaumer Debian GNU/Linux PPC egaumer@pagecache.org http://egaumer.pagecache.org PGP/GPG Key 0xF15D41E9
![](https://secure.gravatar.com/avatar/10aa56aec887e973085025c6ddfdc46b.jpg?s=120&d=mm&r=g)
On Fri, 24 Feb 2006 20:24:35 -0800, Eric Gaumer <egaumer@pagecache.org> wrote:
Valentino,
I noticed a post regarding tada with your name on it. Have you tried it and if so can you comment?
I have it checked out in my hdd but never really used it for anything, thus I don't feel like commenting on it. I just prefer SQLAlchemy with threads.deferToThread(). -- Valentino Volonghi aka Dialtone Now Running MacOSX 10.4 Blog: http://vvolonghi.blogspot.com New Pet: http://www.stiq.it
participants (2)
-
Eric Gaumer
-
Valentino Volonghi aka Dialtone