[issue11999] sporadic failure in test_mailbox on FreeBSD

R. David Murray report at bugs.python.org
Fri May 6 04:43:59 CEST 2011


R. David Murray <rdmurray at bitdance.com> added the comment:

Well, it turns out that this sporadic failure is not a test bug, but a real bug in the mailbox module that the test is revealing.

This issue is the same one that motivated the changes in issue 6896.  Those changes, however, merely reduced the problem, but didn't solve it.

The fundamental problem is that mailbox is relying on comparing the system clock with the mtime.  But the mtime, it turns out, is not guaranteed to follow the system clock.  You can see this most easily if you think about, say, an NSF file system.  The mtime is set by the server's clock, and if the server's clock is different, the mtime won't match the local clock.  It appears to be also true for some reason on a vserver virthost: as reported in issue 6896, the mtime is sometimes set to a value a full second before the time.time() time.

Ironically this was discussed in the original bug report that introduced the mtime checking code (issue 1607951), and I found that bug on the first page of hits while searching for mtime/system clock synchronization problems.  The solution that Andrew proposed in that issue is slightly different from the one he actually implemented, but his proposed solution was also flawed.

The actual solution involves dealing correctly with two factors: the mtime "clock" is not synchronized with the system clock, and the mtime only has a resolution of one second.  The first means that when looking for changes to the mtime we must compare it to the previous value of the mtime.  The second means that if _refresh was last called less than a second ago by our clock (the only one we can query), we had best recheck because the directory may have been updated without the mtime changing.

I also added an additional delta in case the file system clock is skewing relative to the system clock.  I made this a class attribute so that it is adjustable; perhaps it should be made public and documented.

Attached is a patch implementing the fix.  It undoes the 6896 patch, since it is no longer needed.  At this writing my buildbot has run test_mailbox 50 times without failing, where before it would fail every third run or so.

Sadly, I had to reintroduce a 1.1 second fixed sleep into the test.  No way around it.  But it is a deterministic sleep, not a "hope this is long enough" sleep.

----------
Added file: http://bugs.python.org/file21904/mailbox_mtime.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11999>
_______________________________________


More information about the Python-bugs-list mailing list