[Python-ideas] Re: fsync-on-close io object

24 Dec 2020

      On Thu, Dec 24, 2020 at 12:15:08PM -0500, Michael A. Smith wrote:
...
With all the buffering that modern disks and filesystems do, a
specific question has come up a few times with respect to whether or
not data was actually written after flush. I think it would be pretty
useful for the standard library to have a variant in the io module
that would explicitly fsync on close.
One argument against this idea is that "disks and file systems buffer 
for a reason, you should trust them, explicitly calling sync after every 
written file is just going to slow I/O down".

Personally I don't believe this argument, I've been bitten many, many 
times until I learned to explicitly sync files, but its an argument you 
should counter.

Another argument is that even syncing your data doesn't mean that the 
data is actually written to disk, since the hardware can lie. On the 
other hand, I don't know what anyone can do, not even the kernel, in the 
face of deceitful hardware.
...
You might be tempted to argue that this can be done very easily in
Python already, so why include it in the standard io module?
1. It seems to me that it would be better to do this in C, so for the
folks who need to make a consistency > performance kind of choice,
they don't have to sacrifice any additional performance.
The actual I/O is surely going to outweigh the cost of calling sync from 
Python.

This sounds like a trivial micro-optimization for small files, and an 
undetectable one for large files on slow media. If you save a dozen 
microseconds when syncing a two gigabyte file written to a USB-2 stick, 
the sync might take four or five minutes. Are you even going to notice 
the difference?

I think you need to show benchmarks before claiming that this needs to 
be in C.
...
2. Having it in the io library will call attention to this issue,
which I think is something a lot of folks don't consider. Assuming
that `close` or `flush` are sufficient for consistency has always been
wrong (at its limits), but it was less likely to be a stumbling block
in the past, when buffering was less aggressive and less layered and
the peak size and continuous-ness of data streams was a more niche
concern.
I don't know, I wonder whether burying it in the io library will make it 
disappear.

Perhaps a "sync on close" keyword argument to open? At least then it is 
always available and easily discoverable.
...
3. There are many ways to do this, and I think several of them could
be subtly incorrect.
Can you elaborate?

I mean, the obvious way is:

    try:
        with open(..., 'w') as f:
            f.write("stuff")
    finally:
        os.sync()

so maybe all we really need is a "sync file" context manager.

-- 
Steve

[Python-ideas] Re: fsync-on-close io object

Steven D'Aprano