[Python-Dev] Ext4 data loss
Gisle Aas
gisle at activestate.com
Thu Mar 12 09:06:26 CET 2009
On Mar 11, 2009, at 22:43 , Cameron Simpson wrote:
> On 11Mar2009 10:09, Joachim K?nig <him at online.de> wrote:
>> Guido van Rossum wrote:
>>> On Tue, Mar 10, 2009 at 1:11 PM, Christian Heimes
>>> <lists at cheimes.de> wrote:
>>>> [...]
>>>> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54
>>>> .
>>>> [...]
>>> If I understand the post properly, it's up to the app to call
>>> fsync(),
>>> and it's only necessary when you're doing one of the rename
>>> dances, or
>>> updating a file in place. Basically, as he explains, fsync() is a
>>> very
>>> heavyweight operation; I'm against calling it by default anywhere.
>>>
>> To me, the flaw seem to be in the close() call (of the operating
>> system). I'd expect the data to be
>> in a persistent state once the close() returns. So there would be no
>> need to fsync if the file gets closed anyway.
>
> Not really. On the whole, flush() means "the object has handed all
> data
> to the OS". close() means "the object has handed all data to the OS
> and released the control data structures" (OS file descriptor release;
> like the OS, the python interpreter may release python stuff later
> too).
>
> By contrast, fsync() means "the OS has handed filesystem changes to
> the
> disc itself". Really really slow, by comparison with memory. It is
> Very
> Expensive, and a very different operation to close().
...and at least on OS X there is one level more where you actually
tell the
disc to flush its buffers to permanent storage with:
fcntl(fd, F_FULLSYNC)
The fsync manpage says:
Note that while fsync() will flush all data from the host to the
drive
(i.e. the "permanent storage device"), the drive itself may not
physi-
cally write the data to the platters for quite some time and it
may be
written in an out-of-order sequence.
Specifically, if the drive loses power or the OS crashes, the
application
may find that only some or none of their data was written. The
disk
drive may also re-order the data so that later writes may be
present,
while earlier writes are not.
This is not a theoretical edge case. This scenario is easily
reproduced
with real world workloads and drive power failures.
For applications that require tighter guarantees about the
integrity of
their data, Mac OS X provides the F_FULLFSYNC fcntl. The
F_FULLFSYNC
fcntl asks the drive to flush all buffered data to permanent
storage.
Applications, such as databases, that require a strict ordering
of writes
should use F_FULLFSYNC to ensure that their data is written in
the order
they expect. Please see fcntl(2) for more detail.
It's not obvious what level of syncing is appropriate to automatically
happen
from Python so I think it's better to let the application deal with it.
--Gisle
More information about the Python-Dev
mailing list