[Python-Dev] Ext4 data loss
Cameron Simpson
cs at zip.com.au
Wed Mar 11 03:59:00 CET 2009
On 10Mar2009 22:14, A.M. Kuchling <amk at amk.ca> wrote:
| On Wed, Mar 11, 2009 at 11:31:52AM +1100, Cameron Simpson wrote:
| > On 10Mar2009 18:09, A.M. Kuchling <amk at amk.ca> wrote:
| > | The mailbox module tries to be careful and always fsync() before
| > | closing files, because mail messages are pretty important.
| >
| > Can it be turned off? I hadn't realised this.
|
| No, there's no way to turn it off (well, you could delete 'fsync' from
| the os module).
Ah. For myself, were I writing a high load mailbox tool (eg a mail filer
or more to the point, a mail refiler - which I do actually intend to) I
would want to be able to do a huge mass of mailbox stuff and then
possibly issue a sync at the end. For "unix mbox" that might be ok but
for maildirs I'd imagine it leads to an fsync per message.
| > | The tarfile, zipfile, and gzip/bzip2 classes don't seem to use fsync()
| > | at all, either implicitly or by having methods for calling them.
| > | Should they? What about cookielib.CookieJar?
| >
| > I think they should not do this implicitly. By all means let a user
| > issue policy.
|
| The problem is that in some cases the user can't issue policy. For
| example, look at dumbdbm._commit(). It renames a file to a backup,
| opens a new file object, writes to it, and closes it. A caller can't
| fsync() because the file object is created, used, and closed
| internally. With zipfile, you could at least access the .fp attribute
| to sync it (though is the .fp documented as part of the interface?).
I didn't so much mean giving the user an fsync hook so much as publishing a
flag such as ".do_critical_fsyncs" inside the dbm or zipfile object. If true,
issue fsyncs at appropriate times.
| In other words, do we need to ensure that all the relevant library
| modules expose an interface to allow requesting a sync, or getting the
| file descriptor in order to sync it?
With a policy flag you could solve the control issue even for things
which don't expose the fd such as your dumbdbm._commit() example.
If you supply both a flag and an fsync() method it becomes easy for
a user of a module to go:
obj = get_dbm_handle(....)
obj.do_critical_fsyncs = False
... do lots and lots of stuff ...
obj.fsync()
obj.close()
Cheers,
--
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/
In the end, winning is the only safety. - Kerr Avon
More information about the Python-Dev
mailing list