[Python-ideas] Rewriting file - pythonic way

Nick Coghlan ncoghlan at gmail.com
Sun Apr 15 07:40:33 EDT 2018


On 15 April 2018 at 20:47, Paul Moore <p.f.moore at gmail.com> wrote:
> On 15 April 2018 at 11:22, Elazar <elazarg at gmail.com> wrote:
>> בתאריך יום א׳, 15 באפר׳ 2018, 13:13, מאת Serhiy Storchaka
>> ‏<storchaka at gmail.com>:
>>> Actually the reliable code should write into a separate file and replace
>>> the original file by the new file only if writing is successful. Or
>>> backup the old file and restore it if writing is failed. Or do both. And
>>> handle hard and soft links if necessary. And use file locks if needed to
>>> prevent race condition when read/write by different processes. Depending
>>> on the specific of the application you may need different code. Your
>>> three lines are enough for a one-time script if the risk of a powerful
>>> blackout or disk space exhaustion is insignificant or if the data is not
>>> critical.
>>
>> This pitfall sounds like a good reason to have such a function in the
>> standard library.
>
> It certainly sounds like a good reason for someone to write a "safe
> file rewrite" library function. But I don't think that it's such a
> common need that it needs to be a stdlib function. It may well even be
> the case that there's such a function already available on PyPI - has
> anyone actually checked?

There wasn't last time I checked (which admittedly was several years ago now).

The issue is that it's painfully difficult to write a robust
cross-platform "atomic rewrite" operation that can cleanly handle a
wide range of arbitrary use cases - instead, folks are more likely to
write simpler alternatives that work well enough given whichever
simplifying assumptions are applicable to their use case (which may
even include "I don't care about atomicity, and am quite happy to let
a poorly timed Ctrl-C or unexpected system shutdown corrupt the file
I'm rewriting").

https://bugs.python.org/issue8604#msg174104 is the relevant tracker
discussion (deliberately linking into the middle of it, since the
early part is akin to this thread: reactions mostly along the lines of
"that's easy, and doesn't need to be in the standard library". It
definitely *isn't* easy, but it's also challenging to publish on PyPI,
since it's a quagmire of platform specific complexity and edge cases,
if you mess it up you can cause significant data loss, and anyone that
already knows they need atomic rewrites is likely to be able to come
up with their own purpose specific implementation in less time than it
would take them to assess the suitability of 3rd party alternatives).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list