[Twisted-Python] Re: So how does everyone use an ORM in twisted?

I'm using SQLAlchemy with Twisted in a large project without problems. Why would you think you cannot? You just need to run SA operations in non-reactor threads (i currently use my own threadpool and deferToThread but it's only slightly modified standard twisted one), and you need to make sure each connection/session is not used by more than thread at a time, that's all. Maciej Szumocki

"Maciej Szumocki" <mszumocki@contman.pl> writes:
I manage the integration very similarly but I don't use any of the ORM features of SQLAlchemy, so to the OP's question, yes, I prefer straight SQL. It's much trickier if you want to use the ORM, unless you are very careful to fully eager load every thing in any possible database operation if you have need of the information subsequently in your twisted code. Otherwise you may block unexpectedly simply when accessing your objects, and end up with database operations from the main twisted thread. I've never quite managed to get onto the ORM bandwagon, except for some of my simplest applications. But that's why I like SQLAlchemy as much as I do, because it provides a very nice SQL layer without requiring that you use the ORM, and if you stick with SQL, it is in fact fairly straight forward to offload its operations to a separate thread without blocking Twisted. -- David

Don Dwiggins <ddwiggins@advpubtech.com> writes:
If by deferreds you mean pushing the ORM operation (including all possible ancillary object loading) into a separate thread via deferToThread, yes, that's a possibility. But that likely will only work best if you can work with a structure where the results of database operations can be encapsulated into a fairly simple interface, and not as much if the mainline code wants access to the entire object tree. Otherwise, you end up doing a lot of work to protect against the very features (especially lazy loading) that often makes an ORM attractive to people in order to ensure that the object tree is safe to use from within the non-ORM thread. If, however, you mean some other use of deferreds, then no - I agree with Jean-Paul's earlier comment that most ORMs are simply not designed to play nicely with the non-blocking Twisted main loop. Database operations typically trigger from simple attribute access, or from within, say an iteration loop. They then block, so unless they happen to execute very quickly you're blocking the Twisted main loop. And it's tough to insert a normal deferred callback into that equation. You can try to offload the underlying DB operation to a thread pool, but then you start having to fight to explicitly control all DB access, some of which can occur during a simple attribute reference - at which point there's no reasonable way to use a Deferred to postpone the attribute lookup. It's been a while since I looked at it, so I'm not sure of its status, but sAsync was actually a interesting effort to at least get as good a wrapper as possible, but it doesn't preclude having to be cautious about ensuring any objects returned from the wrapper have been fully loaded, or that any interaction that might require going back to the database only be through wrapped methods and not direct attribute access. This can turn out to be a bunch of extra work when you start talking about walking lists of related (one to many) records, traversing the links of foreign keys among records, etc.. much of which is the attractive part of ORMs for those who like them. I suppose it is only fair to also say that if you don't mind some arbitrary delays in the mainline code, there's nothing stopping you from just using an ORM like SQLAlchemy right inline and ignoring the whole issue. For smaller databases or ones whose server are on the local host (or internal to the process like sqlite), it might not even be a real problem. It's certainly flawed structurally and likely won't scale, but that doesn't mean it might not be suitable in certain cases if you know the tradeoffs. -- David

David Bolen wrote:
This is a good point -- although the attractive part of using an ORM in the first place is somewhat OT here, the main attraction for me is usually *DRY* rather than "automatic persistence", so for my purposes such "magical" features as direct attribute access, lazy loading, etc. can actually be undesirable, since they can make application semantics tricky to implement in some cases. (This is somewhat analogous to the original rationale for PB's implementation of "translucently" distributed objects -- "transparently" [magically] distributed objects are not always desirable, and the same goes for "transparently" persistent objects.) Steve

Stephen Waterbury <waterbug@pangalactic.us> writes:
It's a spectrum, but as you start to move further down the "explicit" route at some point I tend to find it closing in on just executing the SQL. For example, if you bypass just iterating an attribute to find matching foreign key records in favor of a more explicit approach such as a separate method call, it becomes closer to just iterating over the SQL to query such records. And having a database layer with explicit SQL for retrieval/update can implement DRY quite well. Of course you can also be more explicit with SQLAlchemy by defining the default loader to be eager, or using eagerload() directly, you just might pay a penalty in overhead in some schemas. I guess what I always run into with ORMs is that I'm rarely lucky enough to have my Python module be the only user of the database, and while SQLAlchemy plays nicer with other tools at the schema level than many ORMs that want to completely manage their persistence tables, I've just had better luck sticking with SQL against a traditional relational schema than one trying to mimic object persistence. But that's more of a general ORM usage debate than a twisted-specific one. What is twisted-specific, is that the additional work to ensure ORM features work without interfering with twisted does seem to make their benefit harder to achieve within the twisted environment. Since I don't think it's been mentioned yet, one other item worth pointing out is that SQLAlchemy does provide quite a bit of flexibility and control over the mapping process, so there's some wiggle room between raw SQL and fully ORM driven eager loading. You can have custom mappers that work against generic selectables (even the results of raw SQL) which can take some of the drudgery out of processing database results into more easily manageable object instances, while still being rather friendly to being offloaded to a dedicated DB thread. -- David

I guess what I always run into with ORMs is that I'm rarely lucky enough to have my Python module be the only user of the database, and
Very strongly agreed. One of the things I hate about most ORMs is their desire to define the SQL schema for you - they mostly do a half-assed job because the "model" seldom contains enough information to define an efficient schema. SQLAlchemy is better than most in that regard, since you can reflect an existing schema, but frankly I just reflect into table objects and use the query builder - the only things the ORM adds are a false sense of security. "Proper" object databases e.g. ZODB, Durus et. al. are a completely different matter, and often very nice (though still not very easy to integrate with Twisted). I imagine Axiom is closer to these than to SQL.

For what it's worth, I've never used SQLAlchemy's ORM capabilities and don't support it in sAsync. I agree that it really isn't compatible with the asynchronous nature of Twisted, and I don't see it as adding anything. To me, the power of SQLAlchemy is in the clean and convenient way it permits you to access the underlying relational database, and sAsync merely lets you do that access in an asynchronous manner, in a single thread or with multiple threads. Best regards, Ed David Bolen wrote:

"Maciej Szumocki" <mszumocki@contman.pl> writes:
I manage the integration very similarly but I don't use any of the ORM features of SQLAlchemy, so to the OP's question, yes, I prefer straight SQL. It's much trickier if you want to use the ORM, unless you are very careful to fully eager load every thing in any possible database operation if you have need of the information subsequently in your twisted code. Otherwise you may block unexpectedly simply when accessing your objects, and end up with database operations from the main twisted thread. I've never quite managed to get onto the ORM bandwagon, except for some of my simplest applications. But that's why I like SQLAlchemy as much as I do, because it provides a very nice SQL layer without requiring that you use the ORM, and if you stick with SQL, it is in fact fairly straight forward to offload its operations to a separate thread without blocking Twisted. -- David

Don Dwiggins <ddwiggins@advpubtech.com> writes:
If by deferreds you mean pushing the ORM operation (including all possible ancillary object loading) into a separate thread via deferToThread, yes, that's a possibility. But that likely will only work best if you can work with a structure where the results of database operations can be encapsulated into a fairly simple interface, and not as much if the mainline code wants access to the entire object tree. Otherwise, you end up doing a lot of work to protect against the very features (especially lazy loading) that often makes an ORM attractive to people in order to ensure that the object tree is safe to use from within the non-ORM thread. If, however, you mean some other use of deferreds, then no - I agree with Jean-Paul's earlier comment that most ORMs are simply not designed to play nicely with the non-blocking Twisted main loop. Database operations typically trigger from simple attribute access, or from within, say an iteration loop. They then block, so unless they happen to execute very quickly you're blocking the Twisted main loop. And it's tough to insert a normal deferred callback into that equation. You can try to offload the underlying DB operation to a thread pool, but then you start having to fight to explicitly control all DB access, some of which can occur during a simple attribute reference - at which point there's no reasonable way to use a Deferred to postpone the attribute lookup. It's been a while since I looked at it, so I'm not sure of its status, but sAsync was actually a interesting effort to at least get as good a wrapper as possible, but it doesn't preclude having to be cautious about ensuring any objects returned from the wrapper have been fully loaded, or that any interaction that might require going back to the database only be through wrapped methods and not direct attribute access. This can turn out to be a bunch of extra work when you start talking about walking lists of related (one to many) records, traversing the links of foreign keys among records, etc.. much of which is the attractive part of ORMs for those who like them. I suppose it is only fair to also say that if you don't mind some arbitrary delays in the mainline code, there's nothing stopping you from just using an ORM like SQLAlchemy right inline and ignoring the whole issue. For smaller databases or ones whose server are on the local host (or internal to the process like sqlite), it might not even be a real problem. It's certainly flawed structurally and likely won't scale, but that doesn't mean it might not be suitable in certain cases if you know the tradeoffs. -- David

David Bolen wrote:
This is a good point -- although the attractive part of using an ORM in the first place is somewhat OT here, the main attraction for me is usually *DRY* rather than "automatic persistence", so for my purposes such "magical" features as direct attribute access, lazy loading, etc. can actually be undesirable, since they can make application semantics tricky to implement in some cases. (This is somewhat analogous to the original rationale for PB's implementation of "translucently" distributed objects -- "transparently" [magically] distributed objects are not always desirable, and the same goes for "transparently" persistent objects.) Steve

Stephen Waterbury <waterbug@pangalactic.us> writes:
It's a spectrum, but as you start to move further down the "explicit" route at some point I tend to find it closing in on just executing the SQL. For example, if you bypass just iterating an attribute to find matching foreign key records in favor of a more explicit approach such as a separate method call, it becomes closer to just iterating over the SQL to query such records. And having a database layer with explicit SQL for retrieval/update can implement DRY quite well. Of course you can also be more explicit with SQLAlchemy by defining the default loader to be eager, or using eagerload() directly, you just might pay a penalty in overhead in some schemas. I guess what I always run into with ORMs is that I'm rarely lucky enough to have my Python module be the only user of the database, and while SQLAlchemy plays nicer with other tools at the schema level than many ORMs that want to completely manage their persistence tables, I've just had better luck sticking with SQL against a traditional relational schema than one trying to mimic object persistence. But that's more of a general ORM usage debate than a twisted-specific one. What is twisted-specific, is that the additional work to ensure ORM features work without interfering with twisted does seem to make their benefit harder to achieve within the twisted environment. Since I don't think it's been mentioned yet, one other item worth pointing out is that SQLAlchemy does provide quite a bit of flexibility and control over the mapping process, so there's some wiggle room between raw SQL and fully ORM driven eager loading. You can have custom mappers that work against generic selectables (even the results of raw SQL) which can take some of the drudgery out of processing database results into more easily manageable object instances, while still being rather friendly to being offloaded to a dedicated DB thread. -- David

I guess what I always run into with ORMs is that I'm rarely lucky enough to have my Python module be the only user of the database, and
Very strongly agreed. One of the things I hate about most ORMs is their desire to define the SQL schema for you - they mostly do a half-assed job because the "model" seldom contains enough information to define an efficient schema. SQLAlchemy is better than most in that regard, since you can reflect an existing schema, but frankly I just reflect into table objects and use the query builder - the only things the ORM adds are a false sense of security. "Proper" object databases e.g. ZODB, Durus et. al. are a completely different matter, and often very nice (though still not very easy to integrate with Twisted). I imagine Axiom is closer to these than to SQL.

For what it's worth, I've never used SQLAlchemy's ORM capabilities and don't support it in sAsync. I agree that it really isn't compatible with the asynchronous nature of Twisted, and I don't see it as adding anything. To me, the power of SQLAlchemy is in the clean and convenient way it permits you to access the underlying relational database, and sAsync merely lets you do that access in an asynchronous manner, in a single thread or with multiple threads. Best regards, Ed David Bolen wrote:
participants (6)
-
David Bolen
-
Don Dwiggins
-
Ed Suominen
-
Maciej Szumocki
-
Phil Mayers
-
Stephen Waterbury