Re: [Python-Dev] cpython (2.7): #9559: Append data to single-file mailbox files if messages are only added
On Thu, 28 Jun 2012 12:59:02 +0200
petri.lehtinen
http://hg.python.org/cpython/rev/c37cb11b546f changeset: 77832:c37cb11b546f branch: 2.7 parent: 77823:73710ae9fedc user: Petri Lehtinen
date: Thu Jun 28 13:48:17 2012 +0300 summary: #9559: Append data to single-file mailbox files if messages are only added If messages were only added, a new file is no longer created and renamed over the old file when flush() is called on an mbox, MMDF or Babyl mailbox.
Why so? Appending is not atomic and, if it fails in the middle, you could get a corrupt mbox file. Furthermore, I disagree that it's a bugfix: IMO it should wait for 3.4. Regards Antoine.
Antoine Pitrou wrote:
If messages were only added, a new file is no longer created and renamed over the old file when flush() is called on an mbox, MMDF or Babyl mailbox.
Why so? Appending is not atomic and, if it fails in the middle, you could get a corrupt mbox file. Furthermore, I disagree that it's a bugfix: IMO it should wait for 3.4.
The code previosly already appended messages to the end of the file when calling add(). This patch just changed it to not do a full rewrite when flush() is called. Having a partially written message in the end of your mailbox doesn't seem like a fatal corruption to me. Furthermore, I (and R. David Murray) think this is not so surprising for users. Most (or all) other implementations always write changes in-place without renaming, as this makes it possible to find out whether new mail has arrived.
On Thu, 28 Jun 2012 16:16:45 +0300, Petri Lehtinen
Antoine Pitrou wrote:
If messages were only added, a new file is no longer created and renamed over the old file when flush() is called on an mbox, MMDF or Babyl mailbox.
Why so? Appending is not atomic and, if it fails in the middle, you could get a corrupt mbox file. Furthermore, I disagree that it's a bugfix: IMO it should wait for 3.4.
The code previosly already appended messages to the end of the file when calling add(). This patch just changed it to not do a full rewrite when flush() is called. Having a partially written message in the end of your mailbox doesn't seem like a fatal corruption to me.
Furthermore, I (and R. David Murray) think this is not so surprising for users. Most (or all) other implementations always write changes in-place without renaming, as this makes it possible to find out whether new mail has arrived.
It is true, however, that Petri found that mutt (I think?) does some extra gymnastics to provide recovery where the write fails part way through, and it would be worth adding that as an enhanced bugfix if someone has the motivation (basically, make a copy of the unmodified mailbox and mv it back into place if the write fails). Even that fix won't prevent corruption in the case of a system crash, but, then, not much will in that case. --David
R. David Murray wrote:
It is true, however, that Petri found that mutt (I think?) does some extra gymnastics to provide recovery where the write fails part way through, and it would be worth adding that as an enhanced bugfix if someone has the motivation (basically, make a copy of the unmodified mailbox and mv it back into place if the write fails).
This is not what mutt does. It just writes the modified part of the mailbox to a temporary file, and then copies the data from the temporary file to the mailbox file. If this last step fails, the temporary file is left behind for recovery. Copying the whole mailbox before making modifications might be clever, though. It's just quite a lot of writing, especially for big mailboxes. OTOH, the whole file is rewritten by the current code, too.
On Thu, 28 Jun 2012 16:16:45 +0300
Petri Lehtinen
Antoine Pitrou wrote:
If messages were only added, a new file is no longer created and renamed over the old file when flush() is called on an mbox, MMDF or Babyl mailbox.
Why so? Appending is not atomic and, if it fails in the middle, you could get a corrupt mbox file. Furthermore, I disagree that it's a bugfix: IMO it should wait for 3.4.
The code previosly already appended messages to the end of the file when calling add(). This patch just changed it to not do a full rewrite when flush() is called.
Ok, I agree it sounds good then. Thanks for explaining. Regards Antoine.
participants (3)
-
Antoine Pitrou
-
Petri Lehtinen
-
R. David Murray