Sorry for the duplicate, Barry. I got bit by the "I don't reply-all by
default" spider.
On Sat, Dec 26, 2020 at 12:30 PM Barry Scott
On 24 Dec 2020, at 17:15, Michael A. Smith
wrote: With all the buffering that modern disks and filesystems do, a specific question has come up a few times with respect to whether or not data was actually written after flush. I think it would be pretty useful for the standard library to have a variant in the io module that would explicitly fsync on close.
You might be tempted to argue that this can be done very easily in Python already, so why include it in the standard io module?
1. It seems to me that it would be better to do this in C, so for the folks who need to make a consistency > performance kind of choice, they don't have to sacrifice any additional performance. 2. Having it in the io library will call attention to this issue, which I think is something a lot of folks don't consider. Assuming that `close` or `flush` are sufficient for consistency has always been wrong (at its limits), but it was less likely to be a stumbling block in the past, when buffering was less aggressive and less layered and the peak size and continuous-ness of data streams was a more niche concern. 3. There are many ways to do this, and I think several of them could be subtly incorrect. In other words, maybe it can't be done very easily and correctly after all. Providing "obviously right" ways to do things is the baileywick of the standard library, isn't it?
I have used rename to make a new file appear atomically after it is closed and I have used fsync to ensure records are on disk before a file is closed.
I've not needed fsync on close yet.
I'm confused -- did you not just describe fsync-on-close? Using fsync to ensure records are on disk before a file is closed is what we're talking about. Perhaps you think I mean to fsync after close? What I really mean is 1. flush if applicable 2. (fdatasync or fsync) if indicated 3. close in that order. What is the use case that needs this? One use case is atomically writing a file by first writing to a temporary file, then renaming the temporary file to its real destination, such as the problem described by He Chen in https://issues.apache.org/jira/browse/AVRO-3013.