[Python-ideas] `to_file()` method for strings

Nathaniel Smith njs at pobox.com
Wed Mar 23 13:14:07 EDT 2016


On Mar 23, 2016 1:13 AM, "Andrew Barnert" <abarnert at yahoo.com> wrote:
>
> On Mar 22, 2016, at 21:49, Nathaniel Smith <njs at pobox.com> wrote:
> >
> >> On Tue, Mar 22, 2016 at 9:33 PM, Chris Angelico <rosuav at gmail.com>
wrote:
> >> On Wed, Mar 23, 2016 at 3:22 PM, Alexander Belopolsky
> >> <alexander.belopolsky at gmail.com> wrote:
> >>>
> >>> On Tue, Mar 22, 2016 at 11:33 PM, Andrew Barnert via Python-ideas
> >>> <python-ideas at python.org> wrote:
> >>>>    with tempfile.NamedTemporaryFile('w', dir=os.path.dirname(path),
> >>>> delete=False) as f:
> >>>>        f.write(s)
> >>>>        f.flush()
> >>>>        os.replace(f.path, path)
> >>> You've got it wrong, but I understand what you tried to achieve.
Note that
> >>> the "write to temp and move" trick may not work if your /tmp and your
path
> >>> are mounted on different filesystems.  And with some filesystems it
may not
> >>> work at all, but I agree that it would be nice to have a state of the
art
> >>> atomic write method somewhere in stdlib.
> >>
> >> It's specifically selecting a directory for the temp file, so it ought
> >> to work. However, I'm not certain in my own head of the interaction
> >> between NamedTemporaryFile with delete=False and os.replace (nor of
> >> exactly how the latter differs from os.rename); what exactly happens
> >> when the context manager exits here? And what happens if there's an
> >> exception in the middle of this and stuff doesn't complete properly?
> >> Are there points at which this could (a) destroy data by deleting
> >> without saving, or (b) leave cruft around?
> >>
> >> This would be very nice to have as either stdlib or a well-documented
recipe.
> >
> > Also: cross-platform support (Windows Is Different),
>
> I know a lot of people who never touch Windows with a 10-foot pole think
this problem is still unsolved, but that's not true. Microsoft added
sufficient atomic-replace APIs in 2003 (in the Win32 API only, not in
crt/libc), and as of 3.3, Python's os.replace really is guaranteed to
either atomically replace the file or leave it untouched and raise an
exception on all platforms (as long as the files are on the same
filesystem, and as long as there's not an EIO or equivalent because of an
already-corrupted filesystem or a physical error on the media). (For
platforms besides Windows and POSIX, it does this just by not existing and
raising a NameError...) Likewise for safe temporary files--as of 3.3,
tempfile.NamedTemporaryFile is safe on every platform where it exists, and
that includes Windows.

Ah, thanks! I indeed didn't know about this, and missed that the code was
calling os.replace rather than os.rename.

> > handling of
> > permissions, do you care about xattrs?
>
> That can be handled effectively the same way as copy vs. copy2 if
desired. I don't know if it's important enough, but if it is, it's easy.
(My library actually does have options for preserving different levels of
stuff, but I never use them.)

Right, but this is the kind of thing that makes me worry about a
one-size-fits-all solution :-).

> > when you say "atomic" then do
> > you mean atomic WRT power loss?
>
> Write-and-replace is atomic WRT both exceptions and power loss. Until the
replace succeeds, the old version of the file is still there. This is
guaranteed by POSIX and by Windows. If the OS can't offer that on some
filesystem, it won't let you call os.replace.

POSIX doesn't guarantee anything whatsoever over power loss. Individual
filesystem implementations make somewhat stronger guarantees, but it's a
mess:
  http://danluu.com/file-consistency/
At the very least atomicity requires fsync'ing the new file before calling
rename, or you might end up with:

   Original code:
       new = open("new")
       new.write(...)
       new.close()
       os.replace("old", "new")

    Gets reordered on the way to the hard drive to become:
       new = open("new")
       os.replace("new", "old")
       new.write(...)
       new.close()

POSIX does of course guarantee that if the OS reorders things like this
then it has to hide that from you -- processes will always see the write
happen before the rename. Except if there's a power loss, now we can have:

    Gets executed as:
       new = open("new")
       os.replace("new", "old")
       --- whoops, power lost here, and so is the file contents ---

But fsync is a very expensive operation; there are plenty of applications
for atomic writes where this is unnecessary (e.g. if the file is being used
as an IPC mechanism, so power loss -> the processes die, no one cares about
their IPC channel anymore). And there are plenty of applications where this
is insufficient (e.g. if you expect/need atomic_write(path1, data1);
atomic_write(path2, data2) to guarantee that the two atomic writes can't be
reordered relative to each other).

I don't want to get sucked into a long debate about this; it's entirely
likely that adding something like that original recipe to the stdlib would
be an improvement, so long as it had *very* detailed docs explaining the
exact tradeoffs made. All I want to do is raise a cautionary flag that such
an effort would need to tread carefully :-)

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160323/6468f87f/attachment-0001.html>


More information about the Python-ideas mailing list