Implementation of shutil.move

The shutil.move function uses os.rename to move files on the same file system. On unix, this function will overwrite an existing destination, so the obvious approach is if not os.path.exists(dst): shutil.move(src, dst) But this could result in race conditions if dst is created after os.path.exists and before shutil.move. From my research, it seems that this is a limitation in the unix c library, but it should be possible to avoid it through a workaround (pieced together from http://bytes.com/topic/python/answers/555794-safely-renaming-file-without-ov...). This involves some fairly low-level work, so I propose adding a new move2 function to shutil, which raises an error if dst exists and locking it if it doesn't: def move2(src, dst): try: fd = os.open(dst, os.O_EXCL | os.O_CREAT) except OSError: raise Error('Destination exists') try: move(src, dst) finally: os.close(fd) This could be optimised by using shutil.move code rather than just calling it, but the idea is that an attempt is made to create dst with exclusive access. If this fails, then it means that the file exists, but if it passes, then dst is locked so no other process can create it.

On Fri, 12 Aug 2011 13:59:22 +0200 David Townshend <aquavitae69@gmail.com> wrote:
This is a reasonable request (although it could also be an optional argument to shutil.move() rather than a separate function). Could you open an issue with your proposal on http://bugs.python.org ? You are also welcome to submit a patch in the issue; please see http://docs.python.org/devguide/ Regards Antoine.

Issue created: http://bugs.python.org/issue12741 Adding an optional argument to move sounds like a better solution, so I'll work on a patch for that. David

On Fri, Aug 12, 2011 at 7:59 AM, David Townshend <aquavitae69@gmail.com> wrote:
This type of problem comes up regularly and a lot of user code is riddled with this kind of race conditions. Many (most?) are avoidable now by writing code to using EAFP rather than LBYL, but many are not. I wonder if this broader problem could be addressed by a context manager. Something to the general effect of the usage try: with lockfile(dst): move(src, dst) except OSError as e: if e != errno.EEXIST: raise raise AppSpecificError("File already exists.") # or whatever and a definition like @contextlib.contextmanager def lockfile(path): fd = os.open(path, os.O_EXCL | os.O_CREAT) yield os.close(fd) The usage is still sort of ugly, but I'm not sure I can think of a general way that isn't. Mike

On Fri, Aug 12, 2011 at 9:20 AM, Mike Graham <mikegraham@gmail.com> wrote:
lockfile is not a good name for this, but the manager itself is very simple and useful. +1 from me for getting this somewhere into stdlib.
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

This doesn't completely solve the race condition: `os.open(path, O_EXCL | O_CREAT)` opens an fd for a file location. If a file was already at that location, it fails, if not, it succeeds with an fd. So if os.move() used this fd to actually write the new data, there would be no race condition in terms of file creation/deletion. This is, as far as I can tell, what was suggested by your linked thread. However, in your suggestion it does not do that: it does opens a new fd to a new file, and then does precisely what it did before. During the time in-between the os.open and the shutil.move(), somebody can delete the created file, and write a new one, or whatever. If they do that, then any such changes they make will be lost because shutil.move will steamroller them and overwrite the file. Devin On Fri, Aug 12, 2011 at 7:59 AM, David Townshend <aquavitae69@gmail.com> wrote:

Am 12.08.2011 15:53, schrieb Devin Jeanpierre:
I couldn't figure out how the open fd should fix the race condition until I read your answer. You are right! The trick doesn't help at all. Even a lock file won't help since POSIX flock()s are only advisory locks. Contrary to shutil.copy(), os.move() doesn't write any data. It only changes some metadata by changing the name of the file. This makes the move op on a single file system an atomic operation. The fd doesn't help here at all. We could implement move with a combination of link() and unlink(). link() sets errno to EEXIST if the destination already exists. shutil.move() will no longer be an atomic operation but that's fine with me. We still have os.move(). Christian

My understanding of os.O_EXCL is that it locks the file from changes by any other process. It appears from a quick test, though, that this is not the case. Perhaps the second suggestion in the linked thread (using link/unlink) would work better, since this situation only arises on unix. I like the idea of a context manager for locking, but I'm not sure how that would work in this case... On Fri, Aug 12, 2011 at 3:53 PM, Devin Jeanpierre <jeanpierreda@gmail.com>wrote:

On 2011-08-12, at 17:03 , Christian Heimes wrote:
This information (that it's an extension) is present in the OpenBSD open(2) page, but I checked it on an OSX machine where it's not specified. Sorry. It's an atomic `flock(fd, LOCK_EX)` (or LOCK_SH) built into open(2) to avoid the unlocked open hole between the open(2) and flock(2) calls. http://www.openbsd.org/cgi-bin/man.cgi?query=open&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html http://www.freebsd.org/cgi/man.cgi?query=open&apropos=0&sektion=2&manpath=FreeBSD+8.2-RELEASE&format=html http://netbsd.gw.com/cgi-bin/man-cgi?open+2+NetBSD-current http://leaf.dragonflybsd.org/cgi/web-man?command=open§ion=2 https://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManP...

Am 12.08.2011 18:37, schrieb Masklinn:
Ah, I've checked the open(2) man page on Linux. Should have mentioned the OS ... sorry, too. Anyway flock()s are only advisory (cooperative) locks and not mandatory locks. Although some Unices have support for mandatory locks, they can be circumvented with unlink(). Christian

If the kernel doesn't allow file locking then I don't see any way that a locking context manager will be possible, but it should still be possible to safely move files using link and unlink. Maybe the other problem functions can be dealt with individually in a similar way? On Aug 12, 2011 6:57 PM, "Christian Heimes" <lists@cheimes.de> wrote:

Masklinn wrote:
I dispute this. I don't think the suggested function will work as described, at least not as given in the bug tracker, and certainly not in a platform-independent way. I don't think you can make it completely platform independent, because the underlying file system operations are too different. E.g. file renames are atomic on POSIX systems, but not on Windows. Glossing over these real differences will merely give people a false sense of security. It is important for people to understand the limitations of what is possible on their system, and not be given wrong ideas. (That's not to say that the proposed function is useless. It may be the closest thing that some users will get to the desired behaviour.) For documentation purposes alone, having two separate functions is valuable. The suggested implementation on the tracker http://bugs.python.org/issue12741 wraps the existing shutil.move function. The most natural implementation is a separate function that calls the first function. To have one function implement both behaviours requires an unnatural implementation, such as: def move(src, dest, flag=False): if flag: try: fd = os.open(dst, os.O_EXCL | os.O_CREAT) except OSError: raise Error('Destination exists') try: move(src, dst, False) # recursive call finally: os.close(fd) else: # Insert current implementation here... Slightly less ugly might be to rename the current function move to _move, then have this: def move(src, dest, flag=False): if flag: try: fd = os.open(dst, os.O_EXCL | os.O_CREAT) except OSError: raise Error('Destination exists') try: _move(src, dst) finally: os.close(fd) else: _move(src, dst) Either case is a prime example of why Guido's rule of thumb that functions shouldn't take arguments which select between two (or more) alternative behaviours is a good design principle. Better to keep two separate functions. -1 on a flag argument to shutil.move. +1 on a "safe" version of shutil.move. +0 on the given implementation. I'm not convinced that the proposed function actually is any safer. On the tracker, the OP says: "If this fails, then it means that the file exists, but if it passes, then dst is locked so no other process can create it." But that's not what actually happens. On my Linux box, I can do this in one terminal:
I then switch to another terminal, and do this: [steve@sylar ~]$ cat spam # File exists, and is empty. [steve@sylar ~]$ echo "Writing to a locked file" > spam [steve@sylar ~]$ And back to the first one:
open(dst).read() 'Writing to a locked file\n'
So despite the allegedly exclusive lock, another process was able to overwrite the file after the lock was taken. -- Steven

As far as I am aware, locked files are impossible to do in a cross-platform way, and there can be no correct implementation of this "safe" move. So, I'm -1. It gives people the wrong idea of what it does. Devin On Sun, Aug 14, 2011 at 12:05 AM, Steven D'Aprano <steve@pearwood.info> wrote:

Devin Jeanpierre <jeanpierreda@gmail.com> writes:
As far as I am aware, locked files are impossible to do in a cross-platform way
The ‘lockfile’ library <URL:http://pypi.python.org/pypi/lockfile> is an attempt to implement cross-platform lockfile functionality. I would like that library to be part of the Python standard library, but I think the current maintainer (Skip Montanaro) no longer has the available time to get that done. Anyone care to work with me on getting ‘lockfile’ into the Python standard library? -- \ “I'm a born-again atheist.” —Gore Vidal | `\ | _o__) | Ben Finney

As far as I can make out, this package is designed to lock files from access by different threads within a program by using a ".lock" file. I can't see how this could lock a file from external modification. On Sun, Aug 14, 2011 at 7:27 AM, Ben Finney <ben+python@benfinney.id.au>wrote:

However... It might be possible to use the "immutable" attribute in linux (and hopefully other unixes) to lock a file. This is still not completely secure since root can remove this attribute and change the file, but this is highly unlikely to happen by accident. On Sun, Aug 14, 2011 at 9:26 AM, David Townshend <aquavitae69@gmail.com>wrote:

David Townshend <aquavitae69@gmail.com> writes:
You're right, ‘lockfile’ is for the cooperative locking conventions common on most operating systems. It doesn't implement mandatory exclusive locking of resources. -- \ “When I was a kid I used to pray every night for a new bicycle. | `\ Then I realised that the Lord doesn't work that way so I stole | _o__) one and asked Him to forgive me.” —Emo Philips | Ben Finney

The behaviour is platform dependant only because it uses os.rename. The implementation is independent of platform so any change would affect all platforms. This might not matter if os.link behaves the same across them all though. Initially I thought it would be better to use an argument, but having had a look at the code I'm not so sure any more, since it could end up as two different implementations in the same function. Also, I'm not so sure its a good idea to have an optional argument (never_overwrite) that defaults to False for backward compatibility, but should usually be set to True for safe behaviour. On Aug 13, 2011 7:23 PM, "Masklinn" <masklinn@masklinn.net> wrote: platform-dependent and dome platforms already behave as proposed, so the proposal really unifies the behaviour across platforms.

On 8/14/2011 12:06 AM, David Townshend wrote:
The new 4th parameter for difflib.SequenceMatcher is much like that. For me the issue is whether the code look like if param: do algorithm a else: do algorithm b versus code if param: something else: something else more code -- Terry Jan Reedy

Am 14.08.2011 08:01, schrieb David Townshend:
My proposed link() / unlink() procedure works only on platforms and file systems, that support hard links. I totally forgot about the file system issue, sorry. :) Hard links won't work on a FAT32 mount point on Unix and probably not on Samba shares, too. It might work on NTFS if the VFS implements it but NTFS has a limited reference counter for hard links. It might be possible that a rename() op would work but a link() op wouldn't. Christian

It seems there's a second problem too - move on remote file systems use copy rather than rename, so changing the implementation means changing it for copy too, which is more difficult. Maybe the best option is to try to apply some sort of locking mechanism, but I can't see how right now. On Aug 14, 2011 4:01 PM, "Christian Heimes" <lists@cheimes.de> wrote: this.

From what I've read, there are several ways of ensuring that these functions fail if destination exists depending on the platform and filesystem, but
I''ve done some more reading and this is where I've got to: shutil.move() uses either os.rename() or shutil.copyfile() (via shutil.copytree()) depending on the situation. shutil.copyfile() uses open(). So to implement a safe move function it would seem to be necessary to do the same to copyfile(), and possibly open(). open(), in my opinon, already behaves as it should. It would be possible to add a slightly safer implementation by writing to a temporary file first, but this would not always be desired or even possible. Perhaps an alternative function could be added if the idea is popular enough? I would expect copyfile(), like move(), to fail if the destination exists. This is not the current behaviour, so both functions could benefit from this. there is no uniform way to do it. One approach would be to try all possible methods and hope that at least one works, with a simple "if os.exists(dst): fail" fallback. The documentation would state that "An exception occurs if the destination exists. This check is done is as safe a way possible to avoid race conditions where the system supports it." An additional measure of safety on copyfile() would be to write to a temporary file first, then use move. This would allow rollback in case of failure during the copy, but as with open(), its not always the most appropriate approach. Adding new copyfile() and move() functions would mean also mean adding new copy(), copy2() and copytree() functions, perhaps as copy3() and copytree2(). This seems to be getting rather messy - three slightly different copy functions, so it might still be better to add an optional argument to these. Alternatively, a new module could be added dedicated to safe file operations. On Sun, Aug 14, 2011 at 5:23 PM, David Townshend <aquavitae69@gmail.com>wrote:

open(), in my opinon, already behaves as it should.
open(..., 'w') doesn't, it overwrites the target. For copying an individual file, you'd want os.open(..., O_EXCL | O_CREAT), which is a cross-platform, race-condition-free way of creating and writing to a single file provided it didn't exist before.
There are two attitudes to take when using the don't-overwrite argument: 1. User cares about safety and race conditions, is glad that this function tries to be safe. 2. User doesn't care about safety or race conditions, as User doesn't have shell scripts creating files at inconvenient times and tempting fate. In case 1, having it silently revert to the unsafe version is very bad, and potentially damaging. In case 2, the user doesn't care about the safe version. In fact, User 2 is probably using os.path.exists already. If it can't do things safely, it shouldn't do them at all. Devin On Mon, Aug 15, 2011 at 6:07 AM, David Townshend <aquavitae69@gmail.com> wrote:

On Mon, Aug 15, 2011 at 9:42 PM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
If it can't do things safely, it shouldn't do them at all.
This pretty much sums up the reason why the standard lib is lacking in this area - it's hard to do anything more than the unsafe basics that isn't providing a *false* sense of security for at least some use cases. Good exploration of the possibilities, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Am 15.08.2011 14:17, schrieb Nick Coghlan:
I have to agree with Nick. The worst mistake we could make is to declare something as secure although it is flawed. Most people don't have to worry about race conditions when renaming a file. The majority of Python apps aren't working in a hostile environment (e.g. hostile users should not be able to modify directories) or don't need more security than a shell script. Let's hope that the few system level programs are written by professionals. Another good thing is the fact that rename(2) already takes symlink attacks into account. It doesn't follow symlinks newpath but replaces the link instead. Christian

On Mon, Aug 15, 2011 at 1:42 PM, Devin Jeanpierre <jeanpierreda@gmail.com>wrote: little point in creating a new file read-only.
So we provide an alternate safe implementation, which may not work on all systems? And it will be up to the user to decide whether to fall back to the unsafe functions?

Am 15.08.2011 14:20, schrieb David Townshend:
For Python 2 it's not possible because Python's internal file API uses fopen(3). fopen(3) has a limited set of flags and exclusive create isn't one of them. You'd have to rewrite larger parts of the API. Python 3's io library is build around file descriptors returned from open(2). It shouldn't be hard to write a 'c' flag for io. Christian

I'll have a look at open() later and see if I can write a patch for python 3. I'm not sure its worth that amount of work for python 2 at this stage in it's lifecycle. I'll also have a look at whether its possible to provide a safe alternative to move and copyfile, even if it cannot be used on all systems. I'm not thinking of actively hostile environments so much as multi-user situations. On Mon, Aug 15, 2011 at 2:40 PM, Christian Heimes <lists@cheimes.de> wrote:

Am 15.08.2011 14:59, schrieb David Townshend:
You have to modify at least the C functions Modules/_io/_iomodule.c:io_open() Modules/_io/fileio.c:fileio_init() as well as the pure python implementation Lib/_pyio.py to implement the 'c' mode. I like the idea and I've been missing the feature for a long time. This may sound harsh. If you proposed changes don't survive hostiles environment then there is no reason in implementing them at all. It's the false sense of security Nick was talking about earlier. At best your solution is slightly less insecure but still insecure and a loophole for exploits. IMHO you should update the docs and explain why and how some operations are subjected to race conditions. Christian

Christian Heimes wrote:
Security against hostile attacks is not the only value for a so-called "safe move". There is also security against accidental collisions. I currently have about 100 processes running as me (excluding system processes), and some of them write to files. Sometimes I have a few scripts running which write to a *lot* of files. I'd like a little more protection from accidental collisions, even if it's not foolproof. But please don't call the function "safe_move", since it isn't safe. Better a bland name like "move2", and full disclosure of what it can and can't protect you from, than a misleading name that gives a false impression. -- Steven

I know I'm late to the party, but I'm surprised that the existing behavior of move() is considered problematic. It carefully mimics the default behavior of the Unix "mv" command. Doing a move or rename atomically with the provision that it fails reliably when the target exists might be useful in some cases, but it seems to be more related to filesystem-based locking, which is known to be hard in a cross-platform way. (It also seems that some folks on the thread have ignored Windows in their use of the term "cross-platform".) -- --Guido van Rossum (python.org/~guido)

On 2011-08-15, at 19:24 , Guido van Rossum wrote:
According to the documentation for `os.rename`, it already fails on windows if the target exists (though I do not know if it also fails on e.g. NFS or across filesystems), which seems to be the behavior desired for "move2". So there is no real reason to discuss it, unless there are tricky edge cases in its behavior.

On Mon, Aug 15, 2011 at 7:24 PM, Guido van Rossum <guido@python.org> wrote:
But "mv" does allow the "-i" (interactive) and "-n" (no-clobber) arguments, which move() doesn't.
That's certainly what I've found in trying to do it! But it seems that it should be possible on most systems, just each in a different way. On windows, it should be as easy as trying os.open(file, os.O_EXEC|os.O_CREAT) first since windows appears to lock a file to a process when its opened. Linux ext file systems (arguably the most common) support an immutable attribute, also effectively locking the file. I think that reiserFS and XFS both has similar features. Unfortunately I have no experience with OSX or HPFS so I'm not sure how it would work on a Mac. This is quite a lot to implement though, so whether its worth it is another matter... especially since it would be far easier (although slower) to just copy/remove if it really is a problem.

The normal open() always did seem a bit weak, and this is probably the biggest use-case that it doesn't cover. +1
Well, that's basically what I was getting at. I don't like the idea of silently falling back to the unsafe thing one bit. It wouldn't be so bad to have something that tries to do it without any race conditions etc., and raises an exception if this isn't possible. Devin On Mon, Aug 15, 2011 at 8:20 AM, David Townshend <aquavitae69@gmail.com> wrote:

Thanks for the info - it will save me looking for it :-) Well, that's basically what I was getting at. I don't like the idea of
This may sound harsh. If you proposed changes don't survive hostiles
So a new function, say safe_copy(), tries to copy securely. If it can't, then an exception is raised. The user can then do something like: try: safe_copy(src, dst) except Error: logging.warning('Unsafe copy in progress') copy2(src, dst) My question now is whether there is really a need for this. The other option is, as Christian says, to document the problem and perhaps present an recipe for avoiding it.

Christian Heimes wrote:
It seems to me that these various "safe-ish move" procedures should be added to shutil, as separate functions, and the limitations of each documented. The caller then can decide which, if any, they are going to use. Avoiding race conditions when renaming or copying files is a hard problem. Who says that there is one solution that will work everywhere? -- Steven

On Fri, 12 Aug 2011 13:59:22 +0200 David Townshend <aquavitae69@gmail.com> wrote:
This is a reasonable request (although it could also be an optional argument to shutil.move() rather than a separate function). Could you open an issue with your proposal on http://bugs.python.org ? You are also welcome to submit a patch in the issue; please see http://docs.python.org/devguide/ Regards Antoine.

Issue created: http://bugs.python.org/issue12741 Adding an optional argument to move sounds like a better solution, so I'll work on a patch for that. David

On Fri, Aug 12, 2011 at 7:59 AM, David Townshend <aquavitae69@gmail.com> wrote:
This type of problem comes up regularly and a lot of user code is riddled with this kind of race conditions. Many (most?) are avoidable now by writing code to using EAFP rather than LBYL, but many are not. I wonder if this broader problem could be addressed by a context manager. Something to the general effect of the usage try: with lockfile(dst): move(src, dst) except OSError as e: if e != errno.EEXIST: raise raise AppSpecificError("File already exists.") # or whatever and a definition like @contextlib.contextmanager def lockfile(path): fd = os.open(path, os.O_EXCL | os.O_CREAT) yield os.close(fd) The usage is still sort of ugly, but I'm not sure I can think of a general way that isn't. Mike

On Fri, Aug 12, 2011 at 9:20 AM, Mike Graham <mikegraham@gmail.com> wrote:
lockfile is not a good name for this, but the manager itself is very simple and useful. +1 from me for getting this somewhere into stdlib.
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

This doesn't completely solve the race condition: `os.open(path, O_EXCL | O_CREAT)` opens an fd for a file location. If a file was already at that location, it fails, if not, it succeeds with an fd. So if os.move() used this fd to actually write the new data, there would be no race condition in terms of file creation/deletion. This is, as far as I can tell, what was suggested by your linked thread. However, in your suggestion it does not do that: it does opens a new fd to a new file, and then does precisely what it did before. During the time in-between the os.open and the shutil.move(), somebody can delete the created file, and write a new one, or whatever. If they do that, then any such changes they make will be lost because shutil.move will steamroller them and overwrite the file. Devin On Fri, Aug 12, 2011 at 7:59 AM, David Townshend <aquavitae69@gmail.com> wrote:

Am 12.08.2011 15:53, schrieb Devin Jeanpierre:
I couldn't figure out how the open fd should fix the race condition until I read your answer. You are right! The trick doesn't help at all. Even a lock file won't help since POSIX flock()s are only advisory locks. Contrary to shutil.copy(), os.move() doesn't write any data. It only changes some metadata by changing the name of the file. This makes the move op on a single file system an atomic operation. The fd doesn't help here at all. We could implement move with a combination of link() and unlink(). link() sets errno to EEXIST if the destination already exists. shutil.move() will no longer be an atomic operation but that's fine with me. We still have os.move(). Christian

My understanding of os.O_EXCL is that it locks the file from changes by any other process. It appears from a quick test, though, that this is not the case. Perhaps the second suggestion in the linked thread (using link/unlink) would work better, since this situation only arises on unix. I like the idea of a context manager for locking, but I'm not sure how that would work in this case... On Fri, Aug 12, 2011 at 3:53 PM, Devin Jeanpierre <jeanpierreda@gmail.com>wrote:

On 2011-08-12, at 17:03 , Christian Heimes wrote:
This information (that it's an extension) is present in the OpenBSD open(2) page, but I checked it on an OSX machine where it's not specified. Sorry. It's an atomic `flock(fd, LOCK_EX)` (or LOCK_SH) built into open(2) to avoid the unlocked open hole between the open(2) and flock(2) calls. http://www.openbsd.org/cgi-bin/man.cgi?query=open&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html http://www.freebsd.org/cgi/man.cgi?query=open&apropos=0&sektion=2&manpath=FreeBSD+8.2-RELEASE&format=html http://netbsd.gw.com/cgi-bin/man-cgi?open+2+NetBSD-current http://leaf.dragonflybsd.org/cgi/web-man?command=open§ion=2 https://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManP...

Am 12.08.2011 18:37, schrieb Masklinn:
Ah, I've checked the open(2) man page on Linux. Should have mentioned the OS ... sorry, too. Anyway flock()s are only advisory (cooperative) locks and not mandatory locks. Although some Unices have support for mandatory locks, they can be circumvented with unlink(). Christian

If the kernel doesn't allow file locking then I don't see any way that a locking context manager will be possible, but it should still be possible to safely move files using link and unlink. Maybe the other problem functions can be dealt with individually in a similar way? On Aug 12, 2011 6:57 PM, "Christian Heimes" <lists@cheimes.de> wrote:

Masklinn wrote:
I dispute this. I don't think the suggested function will work as described, at least not as given in the bug tracker, and certainly not in a platform-independent way. I don't think you can make it completely platform independent, because the underlying file system operations are too different. E.g. file renames are atomic on POSIX systems, but not on Windows. Glossing over these real differences will merely give people a false sense of security. It is important for people to understand the limitations of what is possible on their system, and not be given wrong ideas. (That's not to say that the proposed function is useless. It may be the closest thing that some users will get to the desired behaviour.) For documentation purposes alone, having two separate functions is valuable. The suggested implementation on the tracker http://bugs.python.org/issue12741 wraps the existing shutil.move function. The most natural implementation is a separate function that calls the first function. To have one function implement both behaviours requires an unnatural implementation, such as: def move(src, dest, flag=False): if flag: try: fd = os.open(dst, os.O_EXCL | os.O_CREAT) except OSError: raise Error('Destination exists') try: move(src, dst, False) # recursive call finally: os.close(fd) else: # Insert current implementation here... Slightly less ugly might be to rename the current function move to _move, then have this: def move(src, dest, flag=False): if flag: try: fd = os.open(dst, os.O_EXCL | os.O_CREAT) except OSError: raise Error('Destination exists') try: _move(src, dst) finally: os.close(fd) else: _move(src, dst) Either case is a prime example of why Guido's rule of thumb that functions shouldn't take arguments which select between two (or more) alternative behaviours is a good design principle. Better to keep two separate functions. -1 on a flag argument to shutil.move. +1 on a "safe" version of shutil.move. +0 on the given implementation. I'm not convinced that the proposed function actually is any safer. On the tracker, the OP says: "If this fails, then it means that the file exists, but if it passes, then dst is locked so no other process can create it." But that's not what actually happens. On my Linux box, I can do this in one terminal:
I then switch to another terminal, and do this: [steve@sylar ~]$ cat spam # File exists, and is empty. [steve@sylar ~]$ echo "Writing to a locked file" > spam [steve@sylar ~]$ And back to the first one:
open(dst).read() 'Writing to a locked file\n'
So despite the allegedly exclusive lock, another process was able to overwrite the file after the lock was taken. -- Steven

As far as I am aware, locked files are impossible to do in a cross-platform way, and there can be no correct implementation of this "safe" move. So, I'm -1. It gives people the wrong idea of what it does. Devin On Sun, Aug 14, 2011 at 12:05 AM, Steven D'Aprano <steve@pearwood.info> wrote:

Devin Jeanpierre <jeanpierreda@gmail.com> writes:
As far as I am aware, locked files are impossible to do in a cross-platform way
The ‘lockfile’ library <URL:http://pypi.python.org/pypi/lockfile> is an attempt to implement cross-platform lockfile functionality. I would like that library to be part of the Python standard library, but I think the current maintainer (Skip Montanaro) no longer has the available time to get that done. Anyone care to work with me on getting ‘lockfile’ into the Python standard library? -- \ “I'm a born-again atheist.” —Gore Vidal | `\ | _o__) | Ben Finney

As far as I can make out, this package is designed to lock files from access by different threads within a program by using a ".lock" file. I can't see how this could lock a file from external modification. On Sun, Aug 14, 2011 at 7:27 AM, Ben Finney <ben+python@benfinney.id.au>wrote:

However... It might be possible to use the "immutable" attribute in linux (and hopefully other unixes) to lock a file. This is still not completely secure since root can remove this attribute and change the file, but this is highly unlikely to happen by accident. On Sun, Aug 14, 2011 at 9:26 AM, David Townshend <aquavitae69@gmail.com>wrote:

David Townshend <aquavitae69@gmail.com> writes:
You're right, ‘lockfile’ is for the cooperative locking conventions common on most operating systems. It doesn't implement mandatory exclusive locking of resources. -- \ “When I was a kid I used to pray every night for a new bicycle. | `\ Then I realised that the Lord doesn't work that way so I stole | _o__) one and asked Him to forgive me.” —Emo Philips | Ben Finney

The behaviour is platform dependant only because it uses os.rename. The implementation is independent of platform so any change would affect all platforms. This might not matter if os.link behaves the same across them all though. Initially I thought it would be better to use an argument, but having had a look at the code I'm not so sure any more, since it could end up as two different implementations in the same function. Also, I'm not so sure its a good idea to have an optional argument (never_overwrite) that defaults to False for backward compatibility, but should usually be set to True for safe behaviour. On Aug 13, 2011 7:23 PM, "Masklinn" <masklinn@masklinn.net> wrote: platform-dependent and dome platforms already behave as proposed, so the proposal really unifies the behaviour across platforms.

On 8/14/2011 12:06 AM, David Townshend wrote:
The new 4th parameter for difflib.SequenceMatcher is much like that. For me the issue is whether the code look like if param: do algorithm a else: do algorithm b versus code if param: something else: something else more code -- Terry Jan Reedy

Am 14.08.2011 08:01, schrieb David Townshend:
My proposed link() / unlink() procedure works only on platforms and file systems, that support hard links. I totally forgot about the file system issue, sorry. :) Hard links won't work on a FAT32 mount point on Unix and probably not on Samba shares, too. It might work on NTFS if the VFS implements it but NTFS has a limited reference counter for hard links. It might be possible that a rename() op would work but a link() op wouldn't. Christian

It seems there's a second problem too - move on remote file systems use copy rather than rename, so changing the implementation means changing it for copy too, which is more difficult. Maybe the best option is to try to apply some sort of locking mechanism, but I can't see how right now. On Aug 14, 2011 4:01 PM, "Christian Heimes" <lists@cheimes.de> wrote: this.

From what I've read, there are several ways of ensuring that these functions fail if destination exists depending on the platform and filesystem, but
I''ve done some more reading and this is where I've got to: shutil.move() uses either os.rename() or shutil.copyfile() (via shutil.copytree()) depending on the situation. shutil.copyfile() uses open(). So to implement a safe move function it would seem to be necessary to do the same to copyfile(), and possibly open(). open(), in my opinon, already behaves as it should. It would be possible to add a slightly safer implementation by writing to a temporary file first, but this would not always be desired or even possible. Perhaps an alternative function could be added if the idea is popular enough? I would expect copyfile(), like move(), to fail if the destination exists. This is not the current behaviour, so both functions could benefit from this. there is no uniform way to do it. One approach would be to try all possible methods and hope that at least one works, with a simple "if os.exists(dst): fail" fallback. The documentation would state that "An exception occurs if the destination exists. This check is done is as safe a way possible to avoid race conditions where the system supports it." An additional measure of safety on copyfile() would be to write to a temporary file first, then use move. This would allow rollback in case of failure during the copy, but as with open(), its not always the most appropriate approach. Adding new copyfile() and move() functions would mean also mean adding new copy(), copy2() and copytree() functions, perhaps as copy3() and copytree2(). This seems to be getting rather messy - three slightly different copy functions, so it might still be better to add an optional argument to these. Alternatively, a new module could be added dedicated to safe file operations. On Sun, Aug 14, 2011 at 5:23 PM, David Townshend <aquavitae69@gmail.com>wrote:

open(), in my opinon, already behaves as it should.
open(..., 'w') doesn't, it overwrites the target. For copying an individual file, you'd want os.open(..., O_EXCL | O_CREAT), which is a cross-platform, race-condition-free way of creating and writing to a single file provided it didn't exist before.
There are two attitudes to take when using the don't-overwrite argument: 1. User cares about safety and race conditions, is glad that this function tries to be safe. 2. User doesn't care about safety or race conditions, as User doesn't have shell scripts creating files at inconvenient times and tempting fate. In case 1, having it silently revert to the unsafe version is very bad, and potentially damaging. In case 2, the user doesn't care about the safe version. In fact, User 2 is probably using os.path.exists already. If it can't do things safely, it shouldn't do them at all. Devin On Mon, Aug 15, 2011 at 6:07 AM, David Townshend <aquavitae69@gmail.com> wrote:

On Mon, Aug 15, 2011 at 9:42 PM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
If it can't do things safely, it shouldn't do them at all.
This pretty much sums up the reason why the standard lib is lacking in this area - it's hard to do anything more than the unsafe basics that isn't providing a *false* sense of security for at least some use cases. Good exploration of the possibilities, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Am 15.08.2011 14:17, schrieb Nick Coghlan:
I have to agree with Nick. The worst mistake we could make is to declare something as secure although it is flawed. Most people don't have to worry about race conditions when renaming a file. The majority of Python apps aren't working in a hostile environment (e.g. hostile users should not be able to modify directories) or don't need more security than a shell script. Let's hope that the few system level programs are written by professionals. Another good thing is the fact that rename(2) already takes symlink attacks into account. It doesn't follow symlinks newpath but replaces the link instead. Christian

On Mon, Aug 15, 2011 at 1:42 PM, Devin Jeanpierre <jeanpierreda@gmail.com>wrote: little point in creating a new file read-only.
So we provide an alternate safe implementation, which may not work on all systems? And it will be up to the user to decide whether to fall back to the unsafe functions?

Am 15.08.2011 14:20, schrieb David Townshend:
For Python 2 it's not possible because Python's internal file API uses fopen(3). fopen(3) has a limited set of flags and exclusive create isn't one of them. You'd have to rewrite larger parts of the API. Python 3's io library is build around file descriptors returned from open(2). It shouldn't be hard to write a 'c' flag for io. Christian

I'll have a look at open() later and see if I can write a patch for python 3. I'm not sure its worth that amount of work for python 2 at this stage in it's lifecycle. I'll also have a look at whether its possible to provide a safe alternative to move and copyfile, even if it cannot be used on all systems. I'm not thinking of actively hostile environments so much as multi-user situations. On Mon, Aug 15, 2011 at 2:40 PM, Christian Heimes <lists@cheimes.de> wrote:

Am 15.08.2011 14:59, schrieb David Townshend:
You have to modify at least the C functions Modules/_io/_iomodule.c:io_open() Modules/_io/fileio.c:fileio_init() as well as the pure python implementation Lib/_pyio.py to implement the 'c' mode. I like the idea and I've been missing the feature for a long time. This may sound harsh. If you proposed changes don't survive hostiles environment then there is no reason in implementing them at all. It's the false sense of security Nick was talking about earlier. At best your solution is slightly less insecure but still insecure and a loophole for exploits. IMHO you should update the docs and explain why and how some operations are subjected to race conditions. Christian

Christian Heimes wrote:
Security against hostile attacks is not the only value for a so-called "safe move". There is also security against accidental collisions. I currently have about 100 processes running as me (excluding system processes), and some of them write to files. Sometimes I have a few scripts running which write to a *lot* of files. I'd like a little more protection from accidental collisions, even if it's not foolproof. But please don't call the function "safe_move", since it isn't safe. Better a bland name like "move2", and full disclosure of what it can and can't protect you from, than a misleading name that gives a false impression. -- Steven

I know I'm late to the party, but I'm surprised that the existing behavior of move() is considered problematic. It carefully mimics the default behavior of the Unix "mv" command. Doing a move or rename atomically with the provision that it fails reliably when the target exists might be useful in some cases, but it seems to be more related to filesystem-based locking, which is known to be hard in a cross-platform way. (It also seems that some folks on the thread have ignored Windows in their use of the term "cross-platform".) -- --Guido van Rossum (python.org/~guido)

On 2011-08-15, at 19:24 , Guido van Rossum wrote:
According to the documentation for `os.rename`, it already fails on windows if the target exists (though I do not know if it also fails on e.g. NFS or across filesystems), which seems to be the behavior desired for "move2". So there is no real reason to discuss it, unless there are tricky edge cases in its behavior.

On Mon, Aug 15, 2011 at 7:24 PM, Guido van Rossum <guido@python.org> wrote:
But "mv" does allow the "-i" (interactive) and "-n" (no-clobber) arguments, which move() doesn't.
That's certainly what I've found in trying to do it! But it seems that it should be possible on most systems, just each in a different way. On windows, it should be as easy as trying os.open(file, os.O_EXEC|os.O_CREAT) first since windows appears to lock a file to a process when its opened. Linux ext file systems (arguably the most common) support an immutable attribute, also effectively locking the file. I think that reiserFS and XFS both has similar features. Unfortunately I have no experience with OSX or HPFS so I'm not sure how it would work on a Mac. This is quite a lot to implement though, so whether its worth it is another matter... especially since it would be far easier (although slower) to just copy/remove if it really is a problem.

The normal open() always did seem a bit weak, and this is probably the biggest use-case that it doesn't cover. +1
Well, that's basically what I was getting at. I don't like the idea of silently falling back to the unsafe thing one bit. It wouldn't be so bad to have something that tries to do it without any race conditions etc., and raises an exception if this isn't possible. Devin On Mon, Aug 15, 2011 at 8:20 AM, David Townshend <aquavitae69@gmail.com> wrote:

Thanks for the info - it will save me looking for it :-) Well, that's basically what I was getting at. I don't like the idea of
This may sound harsh. If you proposed changes don't survive hostiles
So a new function, say safe_copy(), tries to copy securely. If it can't, then an exception is raised. The user can then do something like: try: safe_copy(src, dst) except Error: logging.warning('Unsafe copy in progress') copy2(src, dst) My question now is whether there is really a need for this. The other option is, as Christian says, to document the problem and perhaps present an recipe for avoiding it.

Christian Heimes wrote:
It seems to me that these various "safe-ish move" procedures should be added to shutil, as separate functions, and the limitations of each documented. The caller then can decide which, if any, they are going to use. Avoiding race conditions when renaming or copying files is a hard problem. Who says that there is one solution that will work everywhere? -- Steven
participants (15)
-
Antoine Pitrou
-
Ben Finney
-
Calvin Spealman
-
Christian Heimes
-
David Townshend
-
Devin Jeanpierre
-
Georg Brandl
-
Guido van Rossum
-
Masklinn
-
Mike Graham
-
Mike Meyer
-
MRAB
-
Nick Coghlan
-
Steven D'Aprano
-
Terry Reedy