Sorry for the duplicate, Barry. I got bit by the "I don't reply-all by default" spider.

On Sat, Dec 26, 2020 at 12:30 PM Barry Scott <barry@barrys-emacs.org> wrote:


> On 24 Dec 2020, at 17:15, Michael A. Smith <michael@smith-li.com> wrote:
>
> With all the buffering that modern disks and filesystems do, a
> specific question has come up a few times with respect to whether or
> not data was actually written after flush. I think it would be pretty
> useful for the standard library to have a variant in the io module
> that would explicitly fsync on close.
>
> You might be tempted to argue that this can be done very easily in
> Python already, so why include it in the standard io module?
>
> 1. It seems to me that it would be better to do this in C, so for the
> folks who need to make a consistency > performance kind of choice,
> they don't have to sacrifice any additional performance.
> 2. Having it in the io library will call attention to this issue,
> which I think is something a lot of folks don't consider. Assuming
> that `close` or `flush` are sufficient for consistency has always been
> wrong (at its limits), but it was less likely to be a stumbling block
> in the past, when buffering was less aggressive and less layered and
> the peak size and continuous-ness of data streams was a more niche
> concern.
> 3. There are many ways to do this, and I think several of them could
> be subtly incorrect. In other words, maybe it can't be done very
> easily and correctly after all. Providing "obviously right" ways to do
> things is the baileywick of the standard library, isn't it?

I have used rename to make a new file appear atomically after it is closed
and I have used fsync to ensure records are on disk before a file is closed.

I've not needed fsync on close yet.
I'm confused -- did you not just describe fsync-on-close? Using fsync to ensure records are on disk before a file is closed is what we're talking about. Perhaps you think I mean to fsync after close? What I really mean is

1. flush if applicable
2. (fdatasync or fsync) if indicated
3. close

in that order.

What is the use case that needs this?

One use case is atomically writing a file by first writing to a temporary file, then renaming the temporary file to its real destination, such as the problem described by He Chen in https://issues.apache.org/jira/browse/AVRO-3013.