x86 XP trunk failure

Hi all, My build slave (http://www.python.org/dev/buildbot/trunk/x86%20XP%20trunk) keeps failing because of a crash that appears to be in the bsddb module. I assume the master deems the slave to be lost because it's sitting there waiting on me to make a choice on the "debug/abort" dialog box. I can provide details if anybody needs them. I just figured somebody might want to know that this is actual build/test problem instead of some kind of issue with the internet connection here. Thanks, Alan

What branch, and for how long has this dialog been sitting around? For crashes in 3.0, there should not be any such dialogs anymore, but there may have been before I turned them off.
Thanks. You can discard any such dialogs - most likely, they really were from the 3.0 branch, which is known to crash in bsddb. Regards, Martin

On 9/5/07, "Martin v. Löwis" <martin@v.loewis.de> wrote:
It's the trunk; at the moment the debugger is sitting there with python_d.exe at a breakpoint. The current instance of python_d being debugged is only a day or so old. I don't know when this problem started happening, but I think it's been a while (it was happening for all the visible builds on the dashboard when I first noticed it a day or two ago).
Ok.

"Martin v. Löwis" <martin@v.loewis.de> writes:
I think there may actually be an issue here, if only with the tests, even though 3.0 does suppress the dialog. I think I started noticing this in the first build after bringing my buildbot online, so I think on Sep 1. I had manually done on a build on Aug 28 (running the buildbot batch file interactively) without the problem, but I haven't been able to find any relevant source tree changes in that interval. Re-fetching from that date has the problem, and I had blown away my older tree when starting up the buildbot officially (of course :-(). At least for me, it's happening on 2.5 and trunk (hard to tell about 3.0, but that's dying without a dialog), so I thought it might have been something backported. But it also appears common to more platforms than just Windows - it's just Windows that pops up that dialog. In my case, the actual dialog doesn't pop up until the end of the tests, and it seems to be occurring only if test_bsddb3 has run during the tests. On other platforms, it just shows up as a warning message, which doesn't serve to mark the tests as failing (e.g., OS X and FreeBSD) - at the of the test you get a message of: warning: DBTxn aborted in destructor. No prior commit() or abort(). which I tracked back to an abort() call within the bsddb library as final destruction is happening at Python exit. (When clearing the test_bsddb module, and the bsddb wrapper tries to access a log file related to an open transaction). So perhaps there's an issue with how one or more of the tests are constructed, or cleanup or something. I haven't narrowed it down further yet though. As with Alan, more details are available as needed. While it seems to show up in the full test run on more platforms, I have a harder time forcing it by just running test_bsddb3 on FreeBSD, for example, while I get the dialog consistently on Windows. -- David

I previously wrote:
For those more familiar with bsddb, it's the test_1413192.py module in lib/bsddb/test that tickles the problem. It should have been more obvious, since I saw the 1413192 in the module name during exit cleanup, but mentally ignored it as an internal identifier of some sort. The test module clearly leaves an open transaction, but also purges its working directory, so maybe that's why the log file is missing. But since the test was specifically against object destruction, I'm not sure how best to restructure (maybe make env_name into a class that only prunes the directory in __del__? Although that would affect GC and thus destruction order too). This test has been around a bit, but the pruning of the directory was backported recently, which is probably the source of the problems. -- David

warning: DBTxn aborted in destructor. No prior commit() or abort().
I have seen these as well. bsddb isn't very forgiving when you have a Python exception inside a bsddb transaction, in the test suite. IIRC, the exception will abort the transaction, then the unittest fixture teardown will close the environment, and that will cause a bsddb crash because something is getting released that does not exist anymore. When I last looked at it, I did not see an easy way to fix it; contributions are welcome. Regards, Martin

"Martin v. Löwis" <martin@v.loewis.de> writes:
One thing I tried that seems to work fairly well for this case is to encapsulate much of the module-level code in the test into a class instance. That way the module-level code can instantiate and destroy the class instance rather than waiting for the interpreter exit for the latter. It definitely resolves this current issue, but when I reverted the changes to _bsddb.c that were originally made in conjunction with this test, it still seemed to pass the test. So I tried the reverted module with the original test code and it still passes. So I'm not entirely sure that the test is enforcing anything at this point, or at least I'm not sure how to be absolutely positive that the change will continue to enforce what the existing code used to test. But I can open a ticket with the proposed changes if that would help. -- David

What branch, and for how long has this dialog been sitting around? For crashes in 3.0, there should not be any such dialogs anymore, but there may have been before I turned them off.
Thanks. You can discard any such dialogs - most likely, they really were from the 3.0 branch, which is known to crash in bsddb. Regards, Martin

On 9/5/07, "Martin v. Löwis" <martin@v.loewis.de> wrote:
It's the trunk; at the moment the debugger is sitting there with python_d.exe at a breakpoint. The current instance of python_d being debugged is only a day or so old. I don't know when this problem started happening, but I think it's been a while (it was happening for all the visible builds on the dashboard when I first noticed it a day or two ago).
Ok.

"Martin v. Löwis" <martin@v.loewis.de> writes:
I think there may actually be an issue here, if only with the tests, even though 3.0 does suppress the dialog. I think I started noticing this in the first build after bringing my buildbot online, so I think on Sep 1. I had manually done on a build on Aug 28 (running the buildbot batch file interactively) without the problem, but I haven't been able to find any relevant source tree changes in that interval. Re-fetching from that date has the problem, and I had blown away my older tree when starting up the buildbot officially (of course :-(). At least for me, it's happening on 2.5 and trunk (hard to tell about 3.0, but that's dying without a dialog), so I thought it might have been something backported. But it also appears common to more platforms than just Windows - it's just Windows that pops up that dialog. In my case, the actual dialog doesn't pop up until the end of the tests, and it seems to be occurring only if test_bsddb3 has run during the tests. On other platforms, it just shows up as a warning message, which doesn't serve to mark the tests as failing (e.g., OS X and FreeBSD) - at the of the test you get a message of: warning: DBTxn aborted in destructor. No prior commit() or abort(). which I tracked back to an abort() call within the bsddb library as final destruction is happening at Python exit. (When clearing the test_bsddb module, and the bsddb wrapper tries to access a log file related to an open transaction). So perhaps there's an issue with how one or more of the tests are constructed, or cleanup or something. I haven't narrowed it down further yet though. As with Alan, more details are available as needed. While it seems to show up in the full test run on more platforms, I have a harder time forcing it by just running test_bsddb3 on FreeBSD, for example, while I get the dialog consistently on Windows. -- David

I previously wrote:
For those more familiar with bsddb, it's the test_1413192.py module in lib/bsddb/test that tickles the problem. It should have been more obvious, since I saw the 1413192 in the module name during exit cleanup, but mentally ignored it as an internal identifier of some sort. The test module clearly leaves an open transaction, but also purges its working directory, so maybe that's why the log file is missing. But since the test was specifically against object destruction, I'm not sure how best to restructure (maybe make env_name into a class that only prunes the directory in __del__? Although that would affect GC and thus destruction order too). This test has been around a bit, but the pruning of the directory was backported recently, which is probably the source of the problems. -- David

warning: DBTxn aborted in destructor. No prior commit() or abort().
I have seen these as well. bsddb isn't very forgiving when you have a Python exception inside a bsddb transaction, in the test suite. IIRC, the exception will abort the transaction, then the unittest fixture teardown will close the environment, and that will cause a bsddb crash because something is getting released that does not exist anymore. When I last looked at it, I did not see an easy way to fix it; contributions are welcome. Regards, Martin

"Martin v. Löwis" <martin@v.loewis.de> writes:
One thing I tried that seems to work fairly well for this case is to encapsulate much of the module-level code in the test into a class instance. That way the module-level code can instantiate and destroy the class instance rather than waiting for the interpreter exit for the latter. It definitely resolves this current issue, but when I reverted the changes to _bsddb.c that were originally made in conjunction with this test, it still seemed to pass the test. So I tried the reverted module with the original test code and it still passes. So I'm not entirely sure that the test is enforcing anything at this point, or at least I'm not sure how to be absolutely positive that the change will continue to enforce what the existing code used to test. But I can open a ticket with the proposed changes if that would help. -- David
participants (3)
-
"Martin v. Löwis"
-
Alan McIntyre
-
David Bolen