[Martin v. Löwis]
I mostly agree: ZODB is indeed advanced, and it is indeed a good idea to check for presence of os.fsync before using it.
While this is OT, I'd still like to question the usefulness of fsync(2) in the first place, for applications like ZODB. I assume fsync is used as a write barrier, to make sure old modifications are on disk, before making new changes.
That's important, but not primarily for the catastrophic error-recovery scenarios you go on to sketch.
The most important error-recovery procedure is preventative, running backups against a ZODB database while the database is active (Tools/repozo.py in a recent ZODB distribution is the right tool for this). Since a ZODB process may run for months, it's not practical to say that backups require shutting ZODB down.
Without both flushing and fsync'ing, the backup process can't get at all the data that's "really" in the file. Here's a little Python driver:
import os fp = file('test.dat', 'wb') guts = 'x' * 1000000
n = 0 while 1: fp.write(guts) fp.flush() os.fsync(fp.fileno()) n += len(guts) proceed = raw_input("wrote %d so far; hit key" % n)
At least on Windows, both the flush and the fsync are necessary to see one million bytes (via a different process) at the first prompt, two million at the second, and so on. With neither, another process typically sees 0 bytes before the file gets huge. With just one of them, it seems hard to predict, ranging from 0 to "almost" a million additional bytes per prompt.
ZODB does a flush and an fsync after each transaction, so that the backup script (or any other distinct process) sees the most recent data available.
Besides missing large gobs of newer data, without the fsync the backup script may see incomplete data for the most recent transaction that managed to wind up on disk, and erroneously conclude that the database is corrupted.
In short, fsync is necessary to support ZODB best practice.