Problem with MM after power outage

After power was restored and the machine came up, Mailman (2.1.2) is now spitting out errors:
Traceback (most recent call last): File "/usr/local/mailman/bin/qrunner", line 270, in ? main() File "/usr/local/mailman/bin/qrunner", line 230, in main qrunner.run() File "/usr/local/mailman/Mailman/Queue/Runner.py", line 59, in run filecnt = self._oneloop() File "/usr/local/mailman/Mailman/Queue/Runner.py", line 88, in _oneloop msg, msgdata = self._switchboard.dequeue(filebase) File "/usr/local/mailman/Mailman/Queue/Switchboard.py", line 144, in dequeue data = self._ext_read(dbfile) File "/usr/local/mailman/Mailman/Queue/Switchboard.py", line 246, in _ext_read dict = marshal.load(fp) ValueError: bad marshal data
Any suggestions? Worked fine before power got yanked...
Bill
-- bill bradford mrbill@mrbill.net austin, texas

On Fri, 2003-08-15 at 12:52, Bill Bradford wrote:
Hmm, weird. The switchboard is written such that this shouldn't happen (i.e. you shouldn't get a corrupt message.db file). It's supposed to write the message first to .db.tmp and then rename it to .db in an atomic operation.
Do you have any .db.tmp turds in your qfiles directories?
all-problems-should-fix-themselves-ly y'rs, -Barry

On Wednesday 20 August 2003 00:06, you wrote:
This would be ok if the underlying operating system flushed the disk cache upon close(), but I'm afraid this is not the case (at least on linux). This is from man 2 close:
A successful close does not guarantee that the data has been success- fully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)
This behaviour is declared conforming to SVr4, SVID, POSIX, X/OPEN, BSD 4.3.
Therefore I believe the problem reported here happens in this way:
- mailman writes the tmp file, closes it and the atomically renames. this is atomically from userland point of view (e.g. applications will see the file instantly changed)
- under the hood, the operating system is running a disk cach to speed up file operations, therefore what really happened is the file has been written to some RAM pages but not yet on disk.
- at some later time, the disk cache is copied from RAM to disk, effectively making changes permanent. This copy is not atomic, e.g. files bigger than 4k will be written in chunks of 4k pages.
A power interruption (or OS crash, or any other unclean shutdown) in phase 2 could lead to a lost transaction (e.g. the file will appear as never overwritten, like phase 1 never happened).
A power interruption (or OS crash, or any other unclean shutdown) happening in phase 3 could lead to a corrupted file (e.g. some pages written to disk, some pages not).
MTAs usually provide a configuration setting to enable cache flush for each transaction (by use of fsync()), but this is disabled by default because of the severe impact in performance.
Use of BerkeleyDB (or similar transactional db libraries) could eliminate the
problem of corrupted files without the need to fsync, but to solve the
problem in phase 2 we need to guarantee at application level that loosing a
file won't make dangling references or bad states in the related data we
stored elsewhere. Worst case, when restarting after power outage we should
check for transactions to be cancelled because the related file is not on
disk.
An example could be: we put a message on hold for moderation, therefore we
this can happen because actual writes on disk can be reordered by the OS, for
- save the message in a file (or rename from the previous location)
- update the moderation queue index in MailList
- Save() the list config pickle If the system goes down now because of a power outage, when restarting we could have (even fsync()ing everything):
- the index has been regularly updated
- the message is not on disk, or it's in a different filename/path
performance reasons. Accessing the admindb panel now could potentially lead to exceptions.
Now, everyone who is serious about administering a server has a big and dependable UPS, automatically triggering clean shutdowns and so on, therefore everything I've described is not as much as a problem.
-- [pioppo@abulafia pioppo]$ man women No manual entry for women

At 12:45 AM +0200 2003/08/20, Simone Piunno wrote:
From sendmail/README from sendmail 8.12.10.Beta2:
REQUIRES_DIR_FSYNC Turn on support for file systems that require to call fsync() for a directory if the meta-data in it has been changed. This should be turned on at least for older versions of ReiserFS; it is enabled by default for Linux. According to some information this flag is not needed anymore for kernel 2.4.16 and newer. We would appreciate feedback about the semantics of the various file systems available for Linux. An alternative to this compile time flag is to mount the queue directory without the -async option, or using chattr +S on Linux.
So, theoretically, for some filesystems on some versions of Linux
(with more modern kernels), this should work as expected.
Still, the code should be made more robust to deal with this issue.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

Catching up...
On Tue, 2003-08-19 at 18:45, Simone Piunno wrote:
Indeed. I could add an optional flag to enable fsync'ing in the _ext_write() calls in Switchboard.py, but I wouldn't want to enable it by default. It really kills performance.
For 2.1.3, I'll add a flag to Switchboard.py but disable it by default; I'm not going to expose the flag in mm_cfg.py, but it'll be possible for the adventurous to experiment.
-Barry

At 9:39 AM -0400 2003/09/12, Barry Warsaw wrote:
I would suggest making this flag visible on certain OSes. For
example, turned off by default but visible on Linux, and not even visible (without hacking) on OSes that don't need it.
I know it kills performance, but this feature is needed for a
reason, and RFC 1123 and 2821 make it clear that we can't lose mail just because we've had a system crash or some other known foreseeable problem.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2003-09-12 at 12:28, Brad Knowles wrote:
Hmm, I don't know what "make it visible on some OSs" would mean. ;) There's a flag in the file with a helpful comment which is easily edited, just not in mm_cfg.py. There will be a note about it in the release notes for 2.1.3.
-Barry

At 12:35 PM -0400 2003/09/12, Barry Warsaw wrote:
Hmm, I don't know what "make it visible on some OSs" would mean. ;)
So that no hacking is required to make it something the user can
see and modify. We'd still be doing the dangerous thing by leaving it set to default off (in the case of Linux), but at least we wouldn't be requiring that they hack the code in order to be able to tweak this option.
All you should need to do is check the output from `uname -a`.
I understand, but there are many people who use mailman on Linux
who won't be able to make even that kind of simple change. They can't install anything themselves from source, only using rpm. Unfortunately, they are likely to be the bulk of the users, and the ones most likely to be hurt by this sort of thing.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2003-09-12 at 12:42, Brad Knowles wrote:
But, really, they have to hack the code either way. Either you're editing the mm_cfg.py file, or you're editing the Switchboard.py file. The former is a little more visible, since that's the file people are trained to touch.
But here's the thing. For a bug fix release, it seems wrong to expose this in mm_cfg.py because that implies some higher state of blessing. I'm not convinced that we've hit upon the ultimate right solution so I don't want to commit to it. After folks have had a chance to test it and see if 1) it fixes the problem, and 2) what the real world trade-offs are, then we can decide whether it deserves higher profile, or maybe just us choosing to hard code it to always fsync().
I'm mostly unconvinced by this argument. IWBNI Mailman were as simple as a pinball machine but I think folks will still have to read some instructions. Plus, I'm not sure the majority of sites will care, either because their traffic is low enough that it doesn't matter, or they're on reliable power, etc.
-Barry

Barry Warsaw wrote:
I've thought about this some more, and I'm going to reverse the decision not to expose SYNC_AFTER_WRITE in mm_cfg.py. Apologies for being so hard-headed about it.
We have the same potential problem with the config.pck file, so I want to move the same logic into MailList.py, i.e. always flush before closing, and optionally fsync the file. That means moving the option out into Defaults.py.in. I'll do that for 2.1.4.
BTW, has anybody actually turned on SYNC_AFTER_WRITE and have you 1) noticed any improvement in the robustness of the message files, and/or 2) noticed any performance degradation?
-Barry

baw> On Fri, 2003-09-12 at 12:28, Brad Knowles wrote:
>> I would suggest making this flag visible on certain OSes. For
>> example, turned off by default but visible on Linux, and not
>> even visible (without hacking) on OSes that don't need it.
baw> Hmm, I don't know what "make it visible on some OSs" would
baw> mean. ;) There's a flag in the file with a helpful comment
baw> which is easily edited, just not in mm_cfg.py. There will be
baw> a note about it in the release notes for 2.1.3.
And, moreover, the choice should depend upon the file system and file system options. As you know, all Linux boxen do not necessarily only run ext2 even by distribution default.
jam

At 12:56 PM -0400 2003/09/12, John A. Martin wrote:
It's easy enough to check the type of filesystem to be used, and
whether "chatter +S" has been run on the particular directory structure.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, Sep 12, 2003 at 09:58:18PM +0200, Brad Knowles wrote:
This has gotten silly. 99% of the sites out there don't about the tradeoff, and mailman could write synchronously without impacting the performance. Playing fast and loose could be done if a site admin wanted it. Doing OS-specific checks just to set this variable is silly because the admin can make the business decision as to whether they can afford to let the system run with async writes, write-back cache, etc. and its their problem.
-Peter
-- The 5 year plan: In five years we'll make up another plan. Or just re-use this one.

[Peter C. Norton]
Hear, hear. :-)
Although I haven't done any testing as to how much performance is lost by fsync(2)ing, I suspect that the sites who actually *need* this lost performance are (much) more likely to read the upgrade notes than your average Mailman site admin.
Hence, I think it makes more sense to have the default be "do fsync(2)", and let any performance-conscious site decide whether it wants to explicitly value performance over safety.
Harald

On Fri, 2003-09-12 at 17:40, Harald Meland wrote:
Except that when I did some very simple tests, I saw a 97% hit in performance with fsync turned on. This on a RH9, ext3 Linux box of the Dell Optiplex variety. That makes me very nervous to add in a patch release that won't have any beta testing. I've also never seen the bug on python.org, which may or may not be representative of the world at large.
I'm happy to re-address this for the next major release, but for 2.1.3 I don't want to enable fsync by default, and I definitely don't want to do any probing/guessing of filesystems, etc.
-Barry

[Barry Warsaw]
Except that when I did some very simple tests, I saw a 97% hit in performance with fsync turned on.
Ouch. That's pretty severe, all right.
Even though I would *guess* that most "casual" Mailman sites would pull through an effective halving of performance without any problems, that is merely a guess... thus, I'm not at all sure that set of sites that would be in trouble with such a big performance hit is disjoint from the set of sites that don't read upgrade documentation very carefully.
I hadn't considered the "no beta testing" part of this.
Reconsidering now, I agree that such a big performance hit in a "bugfix" release ought to be the cause of quite a bit of nervousness on behalf of the release issuer... :-)
That sounds like a good plan to me.
However, I'm not sure I understand why this shouldn't be configurable in mm_cfg; is that just to keep the number of configurable variables down?
FWIW, PostgreSQL exposes an "do fsync(2)" option in it's global config file; on my Debian system, the option is preceded with this comment:
# A special note on FSYNC: # FSYNC only affects writes to the WAL (Write-Ahead Log). Turning it # off will give some increase in performance, but at the risk of data- # corruption in the event of power failure or other disaster. It is on # by default. I strongly recommend you not to turn it off.
-- Harald

On Fri, 2003-09-12 at 18:50, Harald Meland wrote:
That, and because I kind of see visibility in mm_cfg.py as a blessing of sorts for the configuration option. See my previous message.
Anyway, I'd encourage people to set SYNC_AFTER_WRITE=True once 2.1.3 comes out (or when you update to CVS <wink>) and observe what actually happens. Does it solve the problem? How bad is the performance hit?
Don't forget that if you experiment and post results, let us know essentials such as Python version, OS version, filesystem type, disk subsystem, etc.
-Barry

On Fri, Sep 12, 2003 at 05:56:00PM -0400, Barry Warsaw wrote:
Wow. 97%? That's way too high. I'd expect about 50% at worst - for the extra sync to disk when it enters mailman's queue and one more to flush the message when its made it though the outbound queue to the MTA. This is just a question, because I still don't know much about the mm 2.1 internals, but is there a chance you're sync()'ing more often then you need?
That makes sense to me.
-Peter
-- The 5 year plan: In five years we'll make up another plan. Or just re-use this one.

On Mon, 2003-09-15 at 14:50, Peter C. Norton wrote:
It's possible -- I don't have my test script any more. I just added a flush before closing the config.pck file and I think that will help much more than the sync. IIUC, there's really only a narrow window of opportunity for corruption that sync will solve, and if you're worried about that, you really should be on a UPS and possibly a sync'ing file system.
-Barry

At 6:06 PM -0400 2003/08/19, Barry Warsaw wrote:
Do you fsync() the directory after the close and before the rename?
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Monday 25 August 2003 10:15, Brad Knowles wrote:
According to python documentation (os module), this should do:
fsync(fd) Force write of file with filedescriptor fd to disk. On Unix, this calls the native fsync() function; on Windows, the MS _commit() function. If you're starting with a Python file object f, first do f.flush(), and then do os.fsync(f.fileno()), to ensure that all internal buffers associated with f are written to disk. Availability: Unix, and Windows starting in 2.2.3.
There's also:
fdatasync(fd) Force write of file with filedescriptor fd to disk. Does not force update of metadata. Availability: Unix.
I don't know the difference.
Because the python documentation says nothing about close() calling fsync() automatically, I assume it does not.
-- Adde parvum parvo magnus acervus erit -- Ovidio

On Monday 25 August 2003 22:11, Simone Piunno wrote:
Do you fsync() the directory after the close and before the rename?
Ah, I've found what you were meaning... this is from PosixFilesystem.py (ZODB implementation):
import os from posix import fsync .... def sync_directory(self,dir): if self.use_sync: p = os.path.join(self.dirname,dir) # Use os.open here because, mysteriously, it performs better # than fopen on linux 2.4.18, reiserfs, glibc 2.2.4 f = os.open(p,os.O_RDONLY) # Should we worry about EINTR ? try: fsync(f) finally: os.close(f)
def write_file(self,filename,content):
fullname = os.path.join(self.dirname,filename)
f = os.open(fullname,os.O_CREAT|os.O_RDWR|os.O_TRUNC,0640)
# Should we worry about EINTR ?
try:
os.write(f,content)
if self.use_sync:
fsync(f)
finally:
os.close(f)
-- Adde parvum parvo magnus acervus erit -- Ovidio

At 10:11 PM +0200 2003/08/25, Simone Piunno wrote:
That's the one you want. You're getting nailed on the meta-data
which is not being flushed to disk before the rename.
Because the python documentation says nothing about close() calling fsync() automatically, I assume it does not.
Indeed. ;(
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2003-08-15 at 12:52, Bill Bradford wrote:
Hmm, weird. The switchboard is written such that this shouldn't happen (i.e. you shouldn't get a corrupt message.db file). It's supposed to write the message first to .db.tmp and then rename it to .db in an atomic operation.
Do you have any .db.tmp turds in your qfiles directories?
all-problems-should-fix-themselves-ly y'rs, -Barry

On Wednesday 20 August 2003 00:06, you wrote:
This would be ok if the underlying operating system flushed the disk cache upon close(), but I'm afraid this is not the case (at least on linux). This is from man 2 close:
A successful close does not guarantee that the data has been success- fully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)
This behaviour is declared conforming to SVr4, SVID, POSIX, X/OPEN, BSD 4.3.
Therefore I believe the problem reported here happens in this way:
- mailman writes the tmp file, closes it and the atomically renames. this is atomically from userland point of view (e.g. applications will see the file instantly changed)
- under the hood, the operating system is running a disk cach to speed up file operations, therefore what really happened is the file has been written to some RAM pages but not yet on disk.
- at some later time, the disk cache is copied from RAM to disk, effectively making changes permanent. This copy is not atomic, e.g. files bigger than 4k will be written in chunks of 4k pages.
A power interruption (or OS crash, or any other unclean shutdown) in phase 2 could lead to a lost transaction (e.g. the file will appear as never overwritten, like phase 1 never happened).
A power interruption (or OS crash, or any other unclean shutdown) happening in phase 3 could lead to a corrupted file (e.g. some pages written to disk, some pages not).
MTAs usually provide a configuration setting to enable cache flush for each transaction (by use of fsync()), but this is disabled by default because of the severe impact in performance.
Use of BerkeleyDB (or similar transactional db libraries) could eliminate the
problem of corrupted files without the need to fsync, but to solve the
problem in phase 2 we need to guarantee at application level that loosing a
file won't make dangling references or bad states in the related data we
stored elsewhere. Worst case, when restarting after power outage we should
check for transactions to be cancelled because the related file is not on
disk.
An example could be: we put a message on hold for moderation, therefore we
this can happen because actual writes on disk can be reordered by the OS, for
- save the message in a file (or rename from the previous location)
- update the moderation queue index in MailList
- Save() the list config pickle If the system goes down now because of a power outage, when restarting we could have (even fsync()ing everything):
- the index has been regularly updated
- the message is not on disk, or it's in a different filename/path
performance reasons. Accessing the admindb panel now could potentially lead to exceptions.
Now, everyone who is serious about administering a server has a big and dependable UPS, automatically triggering clean shutdowns and so on, therefore everything I've described is not as much as a problem.
-- [pioppo@abulafia pioppo]$ man women No manual entry for women

At 12:45 AM +0200 2003/08/20, Simone Piunno wrote:
From sendmail/README from sendmail 8.12.10.Beta2:
REQUIRES_DIR_FSYNC Turn on support for file systems that require to call fsync() for a directory if the meta-data in it has been changed. This should be turned on at least for older versions of ReiserFS; it is enabled by default for Linux. According to some information this flag is not needed anymore for kernel 2.4.16 and newer. We would appreciate feedback about the semantics of the various file systems available for Linux. An alternative to this compile time flag is to mount the queue directory without the -async option, or using chattr +S on Linux.
So, theoretically, for some filesystems on some versions of Linux
(with more modern kernels), this should work as expected.
Still, the code should be made more robust to deal with this issue.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

Catching up...
On Tue, 2003-08-19 at 18:45, Simone Piunno wrote:
Indeed. I could add an optional flag to enable fsync'ing in the _ext_write() calls in Switchboard.py, but I wouldn't want to enable it by default. It really kills performance.
For 2.1.3, I'll add a flag to Switchboard.py but disable it by default; I'm not going to expose the flag in mm_cfg.py, but it'll be possible for the adventurous to experiment.
-Barry

At 9:39 AM -0400 2003/09/12, Barry Warsaw wrote:
I would suggest making this flag visible on certain OSes. For
example, turned off by default but visible on Linux, and not even visible (without hacking) on OSes that don't need it.
I know it kills performance, but this feature is needed for a
reason, and RFC 1123 and 2821 make it clear that we can't lose mail just because we've had a system crash or some other known foreseeable problem.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2003-09-12 at 12:28, Brad Knowles wrote:
Hmm, I don't know what "make it visible on some OSs" would mean. ;) There's a flag in the file with a helpful comment which is easily edited, just not in mm_cfg.py. There will be a note about it in the release notes for 2.1.3.
-Barry

At 12:35 PM -0400 2003/09/12, Barry Warsaw wrote:
Hmm, I don't know what "make it visible on some OSs" would mean. ;)
So that no hacking is required to make it something the user can
see and modify. We'd still be doing the dangerous thing by leaving it set to default off (in the case of Linux), but at least we wouldn't be requiring that they hack the code in order to be able to tweak this option.
All you should need to do is check the output from `uname -a`.
I understand, but there are many people who use mailman on Linux
who won't be able to make even that kind of simple change. They can't install anything themselves from source, only using rpm. Unfortunately, they are likely to be the bulk of the users, and the ones most likely to be hurt by this sort of thing.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2003-09-12 at 12:42, Brad Knowles wrote:
But, really, they have to hack the code either way. Either you're editing the mm_cfg.py file, or you're editing the Switchboard.py file. The former is a little more visible, since that's the file people are trained to touch.
But here's the thing. For a bug fix release, it seems wrong to expose this in mm_cfg.py because that implies some higher state of blessing. I'm not convinced that we've hit upon the ultimate right solution so I don't want to commit to it. After folks have had a chance to test it and see if 1) it fixes the problem, and 2) what the real world trade-offs are, then we can decide whether it deserves higher profile, or maybe just us choosing to hard code it to always fsync().
I'm mostly unconvinced by this argument. IWBNI Mailman were as simple as a pinball machine but I think folks will still have to read some instructions. Plus, I'm not sure the majority of sites will care, either because their traffic is low enough that it doesn't matter, or they're on reliable power, etc.
-Barry

Barry Warsaw wrote:
I've thought about this some more, and I'm going to reverse the decision not to expose SYNC_AFTER_WRITE in mm_cfg.py. Apologies for being so hard-headed about it.
We have the same potential problem with the config.pck file, so I want to move the same logic into MailList.py, i.e. always flush before closing, and optionally fsync the file. That means moving the option out into Defaults.py.in. I'll do that for 2.1.4.
BTW, has anybody actually turned on SYNC_AFTER_WRITE and have you 1) noticed any improvement in the robustness of the message files, and/or 2) noticed any performance degradation?
-Barry

baw> On Fri, 2003-09-12 at 12:28, Brad Knowles wrote:
>> I would suggest making this flag visible on certain OSes. For
>> example, turned off by default but visible on Linux, and not
>> even visible (without hacking) on OSes that don't need it.
baw> Hmm, I don't know what "make it visible on some OSs" would
baw> mean. ;) There's a flag in the file with a helpful comment
baw> which is easily edited, just not in mm_cfg.py. There will be
baw> a note about it in the release notes for 2.1.3.
And, moreover, the choice should depend upon the file system and file system options. As you know, all Linux boxen do not necessarily only run ext2 even by distribution default.
jam

At 12:56 PM -0400 2003/09/12, John A. Martin wrote:
It's easy enough to check the type of filesystem to be used, and
whether "chatter +S" has been run on the particular directory structure.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, Sep 12, 2003 at 09:58:18PM +0200, Brad Knowles wrote:
This has gotten silly. 99% of the sites out there don't about the tradeoff, and mailman could write synchronously without impacting the performance. Playing fast and loose could be done if a site admin wanted it. Doing OS-specific checks just to set this variable is silly because the admin can make the business decision as to whether they can afford to let the system run with async writes, write-back cache, etc. and its their problem.
-Peter
-- The 5 year plan: In five years we'll make up another plan. Or just re-use this one.

[Peter C. Norton]
Hear, hear. :-)
Although I haven't done any testing as to how much performance is lost by fsync(2)ing, I suspect that the sites who actually *need* this lost performance are (much) more likely to read the upgrade notes than your average Mailman site admin.
Hence, I think it makes more sense to have the default be "do fsync(2)", and let any performance-conscious site decide whether it wants to explicitly value performance over safety.
Harald

On Fri, 2003-09-12 at 17:40, Harald Meland wrote:
Except that when I did some very simple tests, I saw a 97% hit in performance with fsync turned on. This on a RH9, ext3 Linux box of the Dell Optiplex variety. That makes me very nervous to add in a patch release that won't have any beta testing. I've also never seen the bug on python.org, which may or may not be representative of the world at large.
I'm happy to re-address this for the next major release, but for 2.1.3 I don't want to enable fsync by default, and I definitely don't want to do any probing/guessing of filesystems, etc.
-Barry

[Barry Warsaw]
Except that when I did some very simple tests, I saw a 97% hit in performance with fsync turned on.
Ouch. That's pretty severe, all right.
Even though I would *guess* that most "casual" Mailman sites would pull through an effective halving of performance without any problems, that is merely a guess... thus, I'm not at all sure that set of sites that would be in trouble with such a big performance hit is disjoint from the set of sites that don't read upgrade documentation very carefully.
I hadn't considered the "no beta testing" part of this.
Reconsidering now, I agree that such a big performance hit in a "bugfix" release ought to be the cause of quite a bit of nervousness on behalf of the release issuer... :-)
That sounds like a good plan to me.
However, I'm not sure I understand why this shouldn't be configurable in mm_cfg; is that just to keep the number of configurable variables down?
FWIW, PostgreSQL exposes an "do fsync(2)" option in it's global config file; on my Debian system, the option is preceded with this comment:
# A special note on FSYNC: # FSYNC only affects writes to the WAL (Write-Ahead Log). Turning it # off will give some increase in performance, but at the risk of data- # corruption in the event of power failure or other disaster. It is on # by default. I strongly recommend you not to turn it off.
-- Harald

On Fri, 2003-09-12 at 18:50, Harald Meland wrote:
That, and because I kind of see visibility in mm_cfg.py as a blessing of sorts for the configuration option. See my previous message.
Anyway, I'd encourage people to set SYNC_AFTER_WRITE=True once 2.1.3 comes out (or when you update to CVS <wink>) and observe what actually happens. Does it solve the problem? How bad is the performance hit?
Don't forget that if you experiment and post results, let us know essentials such as Python version, OS version, filesystem type, disk subsystem, etc.
-Barry

On Fri, Sep 12, 2003 at 05:56:00PM -0400, Barry Warsaw wrote:
Wow. 97%? That's way too high. I'd expect about 50% at worst - for the extra sync to disk when it enters mailman's queue and one more to flush the message when its made it though the outbound queue to the MTA. This is just a question, because I still don't know much about the mm 2.1 internals, but is there a chance you're sync()'ing more often then you need?
That makes sense to me.
-Peter
-- The 5 year plan: In five years we'll make up another plan. Or just re-use this one.

On Mon, 2003-09-15 at 14:50, Peter C. Norton wrote:
It's possible -- I don't have my test script any more. I just added a flush before closing the config.pck file and I think that will help much more than the sync. IIUC, there's really only a narrow window of opportunity for corruption that sync will solve, and if you're worried about that, you really should be on a UPS and possibly a sync'ing file system.
-Barry

At 6:06 PM -0400 2003/08/19, Barry Warsaw wrote:
Do you fsync() the directory after the close and before the rename?
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Monday 25 August 2003 10:15, Brad Knowles wrote:
According to python documentation (os module), this should do:
fsync(fd) Force write of file with filedescriptor fd to disk. On Unix, this calls the native fsync() function; on Windows, the MS _commit() function. If you're starting with a Python file object f, first do f.flush(), and then do os.fsync(f.fileno()), to ensure that all internal buffers associated with f are written to disk. Availability: Unix, and Windows starting in 2.2.3.
There's also:
fdatasync(fd) Force write of file with filedescriptor fd to disk. Does not force update of metadata. Availability: Unix.
I don't know the difference.
Because the python documentation says nothing about close() calling fsync() automatically, I assume it does not.
-- Adde parvum parvo magnus acervus erit -- Ovidio

On Monday 25 August 2003 22:11, Simone Piunno wrote:
Do you fsync() the directory after the close and before the rename?
Ah, I've found what you were meaning... this is from PosixFilesystem.py (ZODB implementation):
import os from posix import fsync .... def sync_directory(self,dir): if self.use_sync: p = os.path.join(self.dirname,dir) # Use os.open here because, mysteriously, it performs better # than fopen on linux 2.4.18, reiserfs, glibc 2.2.4 f = os.open(p,os.O_RDONLY) # Should we worry about EINTR ? try: fsync(f) finally: os.close(f)
def write_file(self,filename,content):
fullname = os.path.join(self.dirname,filename)
f = os.open(fullname,os.O_CREAT|os.O_RDWR|os.O_TRUNC,0640)
# Should we worry about EINTR ?
try:
os.write(f,content)
if self.use_sync:
fsync(f)
finally:
os.close(f)
-- Adde parvum parvo magnus acervus erit -- Ovidio

At 10:11 PM +0200 2003/08/25, Simone Piunno wrote:
That's the one you want. You're getting nailed on the meta-data
which is not being flushed to disk before the rename.
Because the python documentation says nothing about close() calling fsync() automatically, I assume it does not.
Indeed. ;(
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
participants (7)
-
Barry Warsaw
-
Bill Bradford
-
Brad Knowles
-
Harald Meland
-
John A. Martin
-
Peter C. Norton
-
Simone Piunno