[Twisted-Python] Consensus on how to handle "MySQL server has gone away"

I have a service running that occasionally connects to a MySQL database. When there is no activity on it for some time, I eventually get the the error _mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away') I have found a few posts on this group related to this issue http://twistedmatrix.com/pipermail/twisted-python/2007-July/015788.html http://twistedmatrix.com/pipermail/twisted-python/2007-October/016178.html as well as some tickets on the twisted homepage: http://twistedmatrix.com/trac/ticket/4404 http://twistedmatrix.com/trac/ticket/4964 I have also seen this referenced as a potential fix: http://www.gelens.org/2009/09/13/twisted-connectionpool-revisited/ I currently have the cp_reconnect parameter set to True, but it does not appear to be doing the job. I don't really see much consensus on how to properly handle this issue. Some people seem to think that the cp_reconnect parameter should take care of it for you, other people say that cp_reconnect is only part of the solution and that you have to write your own error handling. I realize that I could increase the value of MySQL's wait_timeout parameter, but I would rather have the twisted code detect and handle the situation. So, how does one do that? Thanks, Dave

On 10/02/2012 06:09 PM, E S wrote:
For what it's worth - I think adbapi is seriously sub-optimal in this regard. We have continual low-level problems with Twisted apps getting stuck due to hung/dead ConnectionPool. And if you forget cp_reconnect, well you are basically committing suicide. Your Twisted app will need a restart. In particular - it's not clear to my why CP isn't using "cp_good_sql" to probe a connection *before* starting the transaction, and to close/re-open it transparently if it has died and cp_reconnect==1. Instead, the only place the "good" SQL is run is *after* a rollback, so the next N transactions into the pool (where N is the number of threads) all fail, because they don't get as far as "rollback". I think the behaviour it should be aiming for is clear: 1. Test each connection with "good_sql" before beginning the user interaction/query 2. If execeptions occur inside the user interaction, either at cursor methods like execute, or connection methods like commit, then: 1. rollback - if *this* raises an exception, throw the conn away 2. propagate the original exception upwards unchanged (maybe wrapped, maybe not) cp_reconnect should be the default.

On Wed, Oct 3, 2012 at 4:28 PM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
Hi MySQLdb (If you use it) has its own socket wrappers and callings so the only way to co-operate with twisted is to use adbapi. It is possible to use other implementations but there are no one claim production ready. Regards gelin yan

On 10/03/2012 04:28 AM, Phil Mayers wrote:
The problem is that this adds latency; this can add up to quite a slowdown if your database server is on a remote server and you're doing lots of single queries (as opposed to runInteraction).

On Wed, Oct 3, 2012 at 8:27 AM, Itamar Turner-Trauring <itamar@itamarst.org> wrote:
Hi! Sometime ago I faced some related problems and I found this code, "ReconnectingConnectionPool": http://www.gelens.org/2009/09/13/twisted-connectionpool-revisited/ What are your opinion about it? -- Augusto Mecking Caringi

Le Oct 3, 2012 à 6:36 AM, Augusto Mecking Caringi <augustocaringi@gmail.com> a écrit :
If there are bugs in Twisted, or important missing features, step 1 should be to file a bug – first, of course, searching for duplicates – at <http://twistedmatrix.com/trac/newticket>. Especially if the workaround involves calling or overriding some private, internal implementation detail. If you're going to blog about a hack that fixes the problem for you, that post should link to the ticket, so that Twisted can move forward and provide a good experience for people getting started with it. I hate the idea that step 1 for some poor new Twisted user would be to go searching around a couple dozen external websites to apply undocumented hacks to try to just get something basic, like relational database communication, to work acceptably for their application. So, my opinion is that either this isn't a real problem, in which case you shouldn't use it, or it is a real problem, in which case Jeffrey Gelens and powdahound should open a ticket and explain why it's necessary :). -glyph

On Wed, Oct 3, 2012 at 3:39 PM, Glyph <glyph@twistedmatrix.com> wrote:
Well, I did reference two bugs already in the twisted bug tracker that I believe are related to this issue (4964 definitely is anyway). These issues have been in the bug tracking system for years now so I'm not holding my breath on fixes any time soon. I appreciate your desire to make twisted as good as it can be, but we all have schedules to meet and hacks, documented or not, are sometimes necessary to move the project along. So I definitely will consider input from people who have run into this situation and found a solution that has worked for them. If you're not convinced it's a real problem, it's pretty easy to recreate. Assuming your MySQL wait_timeout = 2, you can run the following code and get the (2006, 'MySQL server has gone away') error: from twisted.internet import reactor, defer from twisted.enterprise import adbapi import MySQLdb import MySQLdb.cursors from time import sleep pool = adbapi.ConnectionPool("MySQLdb", host="1.2.3.4", user="someuser", passwd="xxx", db="someschema", cursorclass=MySQLdb.cursors.DictCursor, cp_reconnect=True) def gopherIt(txn): txn.execute("UPDATE Table1 SET Field1 = 123 WHERE Field2 = 456") sleep(3) txn.execute("UPDATE Table1 SET Field1 = 123 WHERE Field2 = 456") def done(x): print "done" def error(f): print f.value d = pool.runInteraction(gopherIt) d.addCallback(done) d.addErrback(error) reactor.callLater(4, reactor.stop) reactor.run() So, unless I have a misunderstanding about what cp_reconnect is supposed to do, this is a documented bug. The question is what, if anything, can be done about it in the short term. Possible solutions are: 1. Increase the wait_timeout value on the server 2. Override methods in the ConnectionPool class (http://www.gelens.org/2009/09/13/twisted-connectionpool-revisited/) 3. Setup a periodic "ping" in each connection to keep it alive 4. Man up and fix adbapi myself I'm leaning towards #1 since it seems to be least invasive, but I am open to other opinions.

Le Oct 4, 2012 à 12:26 PM, E S <electric.or.sharp@gmail.com> a écrit :
Aah, I'm terribly sorry. #4964 is indeed this exact issue; I missed the link.
These issues have been in the bug tracking system for years now so I'm not holding my breath on fixes any time soon.
Well, someone who cares will then need to fix the issue for real :).
I appreciate your desire to make twisted as good as it can be, but we all have schedules to meet and hacks, documented or not, are sometimes necessary to move the project along. So I definitely will consider input from people who have run into this situation and found a solution that has worked for them.
Of course. But you should be aware that this is how issues remain open in the bug tracker for years - they only get fixed when people fix them :).
If you're not convinced it's a real problem, it's pretty easy to recreate. Assuming your MySQL wait_timeout = 2, you can run the following code and get the (2006, 'MySQL server has gone away') error:
No, I'm perfectly convinced it's a real problem. The description on #4964 was very helpful.
4. Man up and fix adbapi myself
While I might disagree with the wording, this is what I'd suggest :).
I'm leaning towards #1 since it seems to be least invasive, but I am open to other opinions.
I think you may be overestimating the difficulty involved in just fixing the bug for real. The conflict between schedule and fixing bugs in the right place - i.e. in Twisted - is that there is some waiting involved, not that fixing the bug is super hard. You have to wait for a code review, then wait for a release. But, assuming your fix is not terribly invasive, you can just do this: Fix the bug. Submit your fix for review. While you're waiting, of course, you need to ship some software, so ship a modified copy of the modified module with your package and hot-patch it at runtime, assuming the fix is not already applied. You can see examples of other projects (that I've worked on) doing that here:<http://bazaar.launchpad.net/~divmod-dev/divmod.org/trunk/view/head:/Epsilon/...> and here:<http://trac.calendarserver.org/browser/CalendarServer/trunk/twext/patches.py>. Respond promptly to any review feedback, when it becomes available, and update your local copy. When a new version of Twisted ships with the bug fixed, and you can update to depend on it, just delete your local copy. Assuming that the final fix is reasonably close to the one that you implemented previously, even older versions of your software should keep working. I hope you do elect to help out with this, we can always use more contributors :). -glyph

On 10/02/2012 06:09 PM, E S wrote:
For what it's worth - I think adbapi is seriously sub-optimal in this regard. We have continual low-level problems with Twisted apps getting stuck due to hung/dead ConnectionPool. And if you forget cp_reconnect, well you are basically committing suicide. Your Twisted app will need a restart. In particular - it's not clear to my why CP isn't using "cp_good_sql" to probe a connection *before* starting the transaction, and to close/re-open it transparently if it has died and cp_reconnect==1. Instead, the only place the "good" SQL is run is *after* a rollback, so the next N transactions into the pool (where N is the number of threads) all fail, because they don't get as far as "rollback". I think the behaviour it should be aiming for is clear: 1. Test each connection with "good_sql" before beginning the user interaction/query 2. If execeptions occur inside the user interaction, either at cursor methods like execute, or connection methods like commit, then: 1. rollback - if *this* raises an exception, throw the conn away 2. propagate the original exception upwards unchanged (maybe wrapped, maybe not) cp_reconnect should be the default.

On Wed, Oct 3, 2012 at 4:28 PM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
Hi MySQLdb (If you use it) has its own socket wrappers and callings so the only way to co-operate with twisted is to use adbapi. It is possible to use other implementations but there are no one claim production ready. Regards gelin yan

On 10/03/2012 04:28 AM, Phil Mayers wrote:
The problem is that this adds latency; this can add up to quite a slowdown if your database server is on a remote server and you're doing lots of single queries (as opposed to runInteraction).

On Wed, Oct 3, 2012 at 8:27 AM, Itamar Turner-Trauring <itamar@itamarst.org> wrote:
Hi! Sometime ago I faced some related problems and I found this code, "ReconnectingConnectionPool": http://www.gelens.org/2009/09/13/twisted-connectionpool-revisited/ What are your opinion about it? -- Augusto Mecking Caringi

Le Oct 3, 2012 à 6:36 AM, Augusto Mecking Caringi <augustocaringi@gmail.com> a écrit :
If there are bugs in Twisted, or important missing features, step 1 should be to file a bug – first, of course, searching for duplicates – at <http://twistedmatrix.com/trac/newticket>. Especially if the workaround involves calling or overriding some private, internal implementation detail. If you're going to blog about a hack that fixes the problem for you, that post should link to the ticket, so that Twisted can move forward and provide a good experience for people getting started with it. I hate the idea that step 1 for some poor new Twisted user would be to go searching around a couple dozen external websites to apply undocumented hacks to try to just get something basic, like relational database communication, to work acceptably for their application. So, my opinion is that either this isn't a real problem, in which case you shouldn't use it, or it is a real problem, in which case Jeffrey Gelens and powdahound should open a ticket and explain why it's necessary :). -glyph

On Wed, Oct 3, 2012 at 3:39 PM, Glyph <glyph@twistedmatrix.com> wrote:
Well, I did reference two bugs already in the twisted bug tracker that I believe are related to this issue (4964 definitely is anyway). These issues have been in the bug tracking system for years now so I'm not holding my breath on fixes any time soon. I appreciate your desire to make twisted as good as it can be, but we all have schedules to meet and hacks, documented or not, are sometimes necessary to move the project along. So I definitely will consider input from people who have run into this situation and found a solution that has worked for them. If you're not convinced it's a real problem, it's pretty easy to recreate. Assuming your MySQL wait_timeout = 2, you can run the following code and get the (2006, 'MySQL server has gone away') error: from twisted.internet import reactor, defer from twisted.enterprise import adbapi import MySQLdb import MySQLdb.cursors from time import sleep pool = adbapi.ConnectionPool("MySQLdb", host="1.2.3.4", user="someuser", passwd="xxx", db="someschema", cursorclass=MySQLdb.cursors.DictCursor, cp_reconnect=True) def gopherIt(txn): txn.execute("UPDATE Table1 SET Field1 = 123 WHERE Field2 = 456") sleep(3) txn.execute("UPDATE Table1 SET Field1 = 123 WHERE Field2 = 456") def done(x): print "done" def error(f): print f.value d = pool.runInteraction(gopherIt) d.addCallback(done) d.addErrback(error) reactor.callLater(4, reactor.stop) reactor.run() So, unless I have a misunderstanding about what cp_reconnect is supposed to do, this is a documented bug. The question is what, if anything, can be done about it in the short term. Possible solutions are: 1. Increase the wait_timeout value on the server 2. Override methods in the ConnectionPool class (http://www.gelens.org/2009/09/13/twisted-connectionpool-revisited/) 3. Setup a periodic "ping" in each connection to keep it alive 4. Man up and fix adbapi myself I'm leaning towards #1 since it seems to be least invasive, but I am open to other opinions.

Le Oct 4, 2012 à 12:26 PM, E S <electric.or.sharp@gmail.com> a écrit :
Aah, I'm terribly sorry. #4964 is indeed this exact issue; I missed the link.
These issues have been in the bug tracking system for years now so I'm not holding my breath on fixes any time soon.
Well, someone who cares will then need to fix the issue for real :).
I appreciate your desire to make twisted as good as it can be, but we all have schedules to meet and hacks, documented or not, are sometimes necessary to move the project along. So I definitely will consider input from people who have run into this situation and found a solution that has worked for them.
Of course. But you should be aware that this is how issues remain open in the bug tracker for years - they only get fixed when people fix them :).
If you're not convinced it's a real problem, it's pretty easy to recreate. Assuming your MySQL wait_timeout = 2, you can run the following code and get the (2006, 'MySQL server has gone away') error:
No, I'm perfectly convinced it's a real problem. The description on #4964 was very helpful.
4. Man up and fix adbapi myself
While I might disagree with the wording, this is what I'd suggest :).
I'm leaning towards #1 since it seems to be least invasive, but I am open to other opinions.
I think you may be overestimating the difficulty involved in just fixing the bug for real. The conflict between schedule and fixing bugs in the right place - i.e. in Twisted - is that there is some waiting involved, not that fixing the bug is super hard. You have to wait for a code review, then wait for a release. But, assuming your fix is not terribly invasive, you can just do this: Fix the bug. Submit your fix for review. While you're waiting, of course, you need to ship some software, so ship a modified copy of the modified module with your package and hot-patch it at runtime, assuming the fix is not already applied. You can see examples of other projects (that I've worked on) doing that here:<http://bazaar.launchpad.net/~divmod-dev/divmod.org/trunk/view/head:/Epsilon/...> and here:<http://trac.calendarserver.org/browser/CalendarServer/trunk/twext/patches.py>. Respond promptly to any review feedback, when it becomes available, and update your local copy. When a new version of Twisted ships with the bug fixed, and you can update to depend on it, just delete your local copy. Assuming that the final fix is reasonably close to the one that you implemented previously, even older versions of your software should keep working. I hope you do elect to help out with this, we can always use more contributors :). -glyph
participants (7)
-
Augusto Mecking Caringi
-
E S
-
gelin yan
-
Glyph
-
Itamar Turner-Trauring
-
Phil Mayers
-
Tobias Oberstein