[Twisted-Python] twisted.enterprise.adbapi and connection pool problem

Hello, I've noticed a bug(?) or at least misbehaviour in adapi connection pool. I've got a daemon program which keeps a few connections to PostgreSQL database and when a client connects and sends some commands it queries the database and sends back results. Everything works fine, until I restart PostgreSQL. All commands fail and the only trace I've got is the response from the database: #v+ FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. #v- Restart was clean, while nobody was connected to the daemon. IMHO it shouldn't go this way. Is there any solution to this problem? Or maybe I must take care of myself and try to detect such occurence and recreate dbpool? -- Michal Chruszcz -=- Seen at http://1lo.sanok.pl/~troll/gallery.php

On Mon, 14 Feb 2005 21:50:57 +0100, Michal Chruszcz <troll@pld-linux.org> wrote:
Hello,
I've noticed a bug(?) or at least misbehaviour in adapi connection pool. I've got a daemon program which keeps a few connections to PostgreSQL database and when a client connects and sends some commands it queries the database and sends back results. Everything works fine, until I restart PostgreSQL. All commands fail and the only trace I've got is the response from the database:
Attempting reconnects just isn't part of adbapi as far as I can tell. I use psycopg with postgresql and here is how I handle this, maybe someone else has a better way of doing it. This tests the database connection every 10 seconds, and attempts to reconnect 3 times if it gets an error, otherwise it exits. It's not very complete but that's because I just threw it together today... In my server factory: ----------------------------------- class Somefactory(Factory): def __init__(self): self.db = 'dbname' self.dbhost = 'dbhostname' self.reconnect = 1 self.dbpool = adbapi.ConnectionPool() loop1 = task.LoopingCall(self.checkDatabase) loop1.start(10.0) # call every 10 seconds def _cbdatabaseError(self,error): error_msg = error.getBriefTraceback() if re.search('psycopg.OperationalError',error_msg): if self.reconnect <= 3: self.dbpool = adbapi.ConnectionPool() self.reconnect = self.reconnect + 1 else: reactor.stop() else: reactor.stop() def _cbdatabaseAlive(self,msg): self.reconnect = 1 def checkDatabase(self): d = self.dbpool.runInteraction(self._cbcheckDatabase) d.addCallbacks(self._cbdatabaseAlive,self._cbdatabaseError) def _cbcheckDatabase(self,cursor): sql = "SELECT 1" cursor.execute(sql) row = cursor.fetchall() return row

Also, I *think* one of the reasons that adbapi doesn't have logic to try and reconnect is that it might be difficult to detect under what conditions you want to attempt a reconnect. The error messages for a lost connection might be different from module to module. For example a telling the difference between a timed out connection and the database server just being down. With psycopg OperationalError gets you pretty much everything that would mean you need to reconnect as far as I can tell. Chris

Il giorno lun, 14-02-2005 alle 13:16 -0800, snacktime ha scritto:
Also, I *think* one of the reasons that adbapi doesn't have logic to try and reconnect is that it might be difficult to detect under what conditions you want to attempt a reconnect. The error messages for a lost connection might be different from module to module. For example a telling the difference between a timed out connection and the database server just being down. With psycopg OperationalError gets you pretty much everything that would mean you need to reconnect as far as I can tell.
From the definition of OperationalError it should be pretty safe to (try to) reconnect whenever you get it. It owuld be nice to have a cofigurable number of reconnection attempts directly in ConnectionPool.
From PEP-249:
OperationalError Exception raised for errors that are related to the database's operation and not necessarily under the control of the programmer, e.g. an unexpected disconnect occurs, the data source name is not found, a transaction could not be processed, a memory allocation error occurred during processing, etc. It must be a subclass of DatabaseEr federico

I've noticed a bug(?) or at least misbehaviour in adapi connection pool. I've got a daemon program which keeps a few connections to PostgreSQL database and when a client connects and sends some commands it queries the database and sends back results. Everything works fine, until I restart PostgreSQL. All commands fail and the only trace I've got is the response from the database:
Attempting reconnects just isn't part of adbapi as far as I can tell. I use psycopg with postgresql and here is how I handle this, maybe someone else has a better way of doing it. This tests the database connection every 10 seconds, and attempts to reconnect 3 times if it gets an error, otherwise it exits. It's not very complete but that's because I just threw it together today...
Funny, I had the same problem and threw together a similar solution with the same approach, down to the "SELECT 1" query. :-) Main difference is, when I detect a problem I do not exit, but just do self.dbpool.close() self.dbpool.start() If you look at the source of adbapi, you see that you can do that even if the program cannot yet reconnect to PostgreSQL. The connections are not really established until something tries to use them. -- Nicola Larosa - nico@tekNico.net The reward for your efforts will be an equivocal one: you will feel as suffocated programming in C++ as an experienced C++ programmer would feel programming in Basic. -- Paul Graham, ANSI Common Lisp, 1995

On Feb 14, 2005, at 4:05 PM, snacktime wrote:
Attempting reconnects just isn't part of adbapi as far as I can tell.
It is part of adbapi in Twisted 2.0 (see cp_reconnect arg to the DatabasePool constructor). However, it will only reconnect after failing a query. It will never retry a query (that could be unsafe to do), but will re-establish the database connection after that query fails. Thus, subsequent queries will succeed. If you know some way to tell for sure that it there is no possibility that the query has been executed, you could submit a patch for adbapi that it will retry the query in those circumstances. James

Il giorno lun, 14-02-2005 alle 21:03 -0500, James Y Knight ha scritto:
On Feb 14, 2005, at 4:05 PM, snacktime wrote:
Attempting reconnects just isn't part of adbapi as far as I can tell.
It is part of adbapi in Twisted 2.0 (see cp_reconnect arg to the DatabasePool constructor). However, it will only reconnect after failing a query. It will never retry a query (that could be unsafe to do), but will re-establish the database connection after that query fails. Thus, subsequent queries will succeed.
If you know some way to tell for sure that it there is no possibility that the query has been executed, you could submit a patch for adbapi that it will retry the query in those circumstances.
If the database support transactions and an exception is raised the DatabasePool does a .rollback() (if I remember correctly). In this case you're sure the query has not been executed (it was rolled back explicitly.) The problem is with backends that do not support transactions, but if you're crazy enough to use one of them having a query repeated is a lesser evil. ;) IMHO, it is safe to rollback, reconnect and send the query again. Maybe a cp_repeat_query=True|False parameter? -- Federico Di Gregorio http://people.initd.org/fog Debian GNU/Linux Developer fog@debian.org INIT.D Developer fog@initd.org Nobody will ever need more than 640k RAM! -- Bill Gates, 1981 Windows 95 needs at least 8 MB RAM. -- Bill Gates, 1996 Nobody will ever need Windows 95. -- logical conclusion

On Feb 15, 2005, at 4:22 AM, Federico Di Gregorio wrote:
If the database support transactions and an exception is raised the DatabasePool does a .rollback() (if I remember correctly). In this case you're sure the query has not been executed (it was rolled back explicitly.)
If the database server returned an exception and you can actually execute a rollback, the connection didn't die. :) That is not the situation we're talking about. It's when you send a command over the network, and the remote host maybe gets it, or maybe not, and then you unplug your net connection before you get a response. You don't know if the server got the command or not. However, your point stands: If you're in a transaction and failed anything but the commit, then you know the transaction was unexecuted. If you fail the commit command, it's harder. With certain errors you know came from the database server rather than the network, you can probably also know it's unexecuted, but in some cases it's unclear. In contrast to my previous message, I do believe cp_reconnect will notice a failed connection before attempting a query on it, since it checks the result of db.cursor() when starting the transaction, and if that fails, reconnects. This is likely sufficient for most website-like application, where a failed query just results in the user pressing reload anyways. For a more serious application, it should handle any errors sensibly, including those that happen at commit time, so it needs to be able to handle errors itself, in any case. James

Il giorno mar, 15-02-2005 alle 10:35 -0500, James Y Knight ha scritto:
On Feb 15, 2005, at 4:22 AM, Federico Di Gregorio wrote:
If the database support transactions and an exception is raised the DatabasePool does a .rollback() (if I remember correctly). In this case you're sure the query has not been executed (it was rolled back explicitly.)
If the database server returned an exception and you can actually execute a rollback, the connection didn't die. :) That is not the situation we're talking about. It's when you send a command over the network, and the remote host maybe gets it, or maybe not, and then you unplug your net connection before you get a response. You don't know if the server got the command or not. However, your point stands: If you're in a transaction and failed anything but the commit, then you know the transaction was unexecuted. If you fail the commit command, it's harder. With certain errors you know came from the database server rather than the network, you can probably also know it's unexecuted, but in some cases it's unclear.
Agreed. If you fail (probably because of network problems) during the commit the situation can be so bad that shutting down the application until you can check the database is a very good idea. :/
In contrast to my previous message, I do believe cp_reconnect will notice a failed connection before attempting a query on it, since it checks the result of db.cursor() when starting the transaction, and if
It depends on the database adapter. For example, in psycopg, you can call conn.cursor() on an "unplugged" connection without getting any errors. It is the curs.execute() that will fail. Sending data to the backend at every .cursor() just to check the connection status is an overhead that some people don't want.
that fails, reconnects. This is likely sufficient for most website-like application, where a failed query just results in the user pressing reload anyways.
For a more serious application, it should handle any errors sensibly, including those that happen at commit time, so it needs to be able to handle errors itself, in any case.
I agree but I don't see any sensible way to manage an error at .commit() time. federico -- Federico Di Gregorio http://people.initd.org/fog Debian GNU/Linux Developer fog@debian.org INIT.D Developer fog@initd.org I terminali seriali sono in via di estinzione (infatti quello che c'era si è estinto) -- Simone Caldana
participants (5)
-
Federico Di Gregorio
-
James Y Knight
-
Michal Chruszcz
-
Nicola Larosa
-
snacktime