chdir context manager

The following is a common pattern (used by, for example, shutil.make_archive): save_cwd = os.getcwd() try: foo() finally: os.chdir(save_cwd) I suggest this deserves a context manager: with saved_cwd(): foo() Initial feedback on IRC suggests shutil as where this functionality should live (other suggestions were made, such as pathlib). Hence, attached patch implements this as shutil.saved_cwd, based on os.fchdir. The patch also adds os.chdir to os.supports_dir_fd and documents the context manager abilities of builtins.open() in its reference. Thoughts? Thanks, Daniel diff -r 74b0461346f0 Doc/library/functions.rst --- a/Doc/library/functions.rst Fri Jan 18 17:53:18 2013 -0800 +++ b/Doc/library/functions.rst Sat Jan 19 09:39:27 2013 +0000 @@ -828,6 +828,9 @@ are always available. They are listed h Open *file* and return a corresponding :term:`file object`. If the file cannot be opened, an :exc:`OSError` is raised. + This function can be used as a :term:`context manager` that closes the + file when it exits. + *file* is either a string or bytes object giving the pathname (absolute or relative to the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped. (If a file descriptor diff -r 74b0461346f0 Doc/library/os.rst --- a/Doc/library/os.rst Fri Jan 18 17:53:18 2013 -0800 +++ b/Doc/library/os.rst Sat Jan 19 09:39:27 2013 +0000 @@ -1315,6 +1315,9 @@ features: This function can support :ref:`specifying a file descriptor <path_fd>`. The descriptor must refer to an opened directory, not an open file. + See also :func:`shutil.saved_cwd` for a context manager that restores the + current working directory. + Availability: Unix, Windows. .. versionadded:: 3.3 diff -r 74b0461346f0 Doc/library/shutil.rst --- a/Doc/library/shutil.rst Fri Jan 18 17:53:18 2013 -0800 +++ b/Doc/library/shutil.rst Sat Jan 19 09:39:27 2013 +0000 @@ -36,6 +36,19 @@ copying and removal. For operations on i Directory and files operations ------------------------------ +.. function:: saved_cwd() + + Return a :term:`context manager` that restores the current working directory + when it exits. See :func:`os.chdir` for changing the current working + directory. + + The context manager returns an open file descriptor for the saved directory. + + Only available when :func:`os.chdir` supports file descriptor arguments. + + .. versionadded:: 3.4 + + .. function:: copyfileobj(fsrc, fdst[, length]) Copy the contents of the file-like object *fsrc* to the file-like object *fdst*. diff -r 74b0461346f0 Lib/os.py --- a/Lib/os.py Fri Jan 18 17:53:18 2013 -0800 +++ b/Lib/os.py Sat Jan 19 09:39:27 2013 +0000 @@ -120,6 +120,7 @@ if _exists("_have_functions"): _set = set() _add("HAVE_FACCESSAT", "access") + _add("HAVE_FCHDIR", "chdir") _add("HAVE_FCHMODAT", "chmod") _add("HAVE_FCHOWNAT", "chown") _add("HAVE_FSTATAT", "stat") diff -r 74b0461346f0 Lib/shutil.py --- a/Lib/shutil.py Fri Jan 18 17:53:18 2013 -0800 +++ b/Lib/shutil.py Sat Jan 19 09:39:27 2013 +0000 @@ -38,6 +38,7 @@ __all__ = ["copyfileobj", "copyfile", "c "unregister_unpack_format", "unpack_archive", "ignore_patterns", "chown", "which"] # disk_usage is added later, if available on the platform + # saved_cwd is added later, if available on the platform class Error(OSError): pass @@ -1111,3 +1112,20 @@ def which(cmd, mode=os.F_OK | os.X_OK, p if _access_check(name, mode): return name return None + +# Define the chdir context manager. +if os.chdir in os.supports_dir_fd: + class saved_cwd: + def __init__(self): + pass + def __enter__(self): + self.dh = os.open(os.curdir, + os.O_RDONLY | getattr(os, 'O_DIRECTORY', 0)) + return self.dh + def __exit__(self, exc_type, exc_value, traceback): + try: + os.chdir(self.dh) + finally: + os.close(self.dh) + return False + __all__.append('saved_cwd') diff -r 74b0461346f0 Lib/test/test_shutil.py --- a/Lib/test/test_shutil.py Fri Jan 18 17:53:18 2013 -0800 +++ b/Lib/test/test_shutil.py Sat Jan 19 09:39:27 2013 +0000 @@ -1276,6 +1276,20 @@ class TestShutil(unittest.TestCase): rv = shutil.copytree(src_dir, dst_dir) self.assertEqual(['foo'], os.listdir(rv)) + def test_saved_cwd(self): + if hasattr(os, 'fchdir'): + temp_dir = self.mkdtemp() + orig_dir = os.getcwd() + with shutil.saved_cwd() as dir_fd: + os.chdir(temp_dir) + new_dir = os.getcwd() + self.assertIsInstance(dir_fd, int) + final_dir = os.getcwd() + self.assertEqual(orig_dir, final_dir) + self.assertEqual(temp_dir, new_dir) + else: + self.assertFalse(hasattr(shutil, 'saved_cwd')) + class TestWhich(unittest.TestCase):

On 19 January 2013 10:10, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
But if doing that, why does "foo" have to implement the directory changing itself? Why not something along: with temp_dir("/tmp"): # things that perform in /tmp # directory is restored. Of course that one function could do both things, depending ob wether a <dir> parameter is passed. js -><-

Daniel Shahaf <d.s@...> writes:
I implemented this in distlib as: @contextlib.contextmanager def chdir(d): cwd = os.getcwd() try: os.chdir(d) yield finally: os.chdir(cwd) which could perhaps be placed in shutil, so usage would be: with shutil.chdir('new_dir'): # work with new_dir as current dir # directory restored when you get here. Regards, Vinay Sajip

On Sat, Jan 19, 2013 at 01:46:26PM +0000, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Pushd or pushdir would be a better name, IMHO. https://en.wikipedia.org/wiki/Pushd_and_popd Quite a known pair of names. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 1/19/2013 5:10 AM, Daniel Shahaf wrote:
This strikes me as not a proper context manager. A context manager should create a temporary, altered context. One way is to add something that is deleted on exit. File as context managers are the typical example. Another way is to alter something after saving the restore info, and restoring on exit. An example would be a context manager to temporarily change stdout. (Do we have one? If not, it would be at least as generally useful as this proposal.) So to me, your proposal is only 1/2 or 2/3 of a context manager. (And 'returns an open file descriptor for the saved directory' seems backward or wrong for a context manager.) It does not actually make a new context. A proper temp_cwd context manager should have one parameter, the new working directory, with chdir(new_cwd) in the enter method. To allow for conditional switching, the two chdir system calls could be conditional on new_cwd (either None or '' would mean no chdir calls). Looking at your pattern, if foo() does not change cwd, the save and restore is pointless, even if harmless. If foo does change cwd, it should also restore it, whether explicitly or with a context manager temp_cwd. Looking at your actual example, shutil.make_archive, the change and restore are conditional and asymmetrical. save_cwd = os.getcwd() if root_dir is not None: ... if not dry_run: os.chdir(root_dir) ... finally: if root_dir is not None: ... os.chdir(save_cwd) The initial chdir is conditional on dry_run (undocumented, but passed on to the archive function), the restore is not. Since I believe not switching on dry_runs is just a minor optimization, I believe that that condition could be dropped and the code re-written as with new_cwd(root_dir): ... I am aware that this would require a change in the finally logging, but that would be true of the original proposal also. -- Terry Jan Reedy

Terry Reedy wrote on Sat, Jan 19, 2013 at 08:37:17 -0500:
What should __enter__ return, then? It could return None, the to-be-restored directory's file descriptor, or the newly-changed-to directory (once a "directory to chdir to" optional argument is added). The latter could be either a pathname (string) or a file descriptor (since it's just passed through to os.chdir). It seems to me returning the old dir's fd would be the most useful of the three option, since the other two are things callers already have --- None, which is global, and the argument to the context manager.
I think making the new_cwd argument optional would be useful if the context manager body does multiple chdir() calls: with saved_cwd(): os.chdir('/foo') do_something() os.chdir('/bar') do_something() I'm not sure if that's exactly what you suggest --- you seem to be suggesting that saved_cwd(None) will avoid calling fchdir() from __exit__()?
Looking at your pattern, if foo() does not change cwd, the save and restore is pointless, even if harmless.
Do you have a better suggestion? Determining whether the fchdir() call can be avoided, if possible, presumably requires a system call, so I figure you might as well call fchdir() without trying to make that determination.
shutil.make_archive is just the first place in stdlib which uses the pattern, or something close to it. It's not exactly a canonical example. There are some canonical examples of the "pattern" in test_os.py.
-- Terry Jan Reedy
Cheers Daniel

On 1/19/2013 10:06 AM, Daniel Shahaf wrote:
make_archive would prefer the old dir pathname, as it wants that for the logging call. But I do not think that that should drive design.
I was, but that is a non-essential optimization. My idea is basically similar to Bueno's except for parameter absent versus None (and the two cases could be handled differently). I think this proposal suffers a bit from being both too specific and too general. Eli explained the 'too specific' part: there are many things that might be changed and changed back. The 'too general' part is that specific applications need different specific details. There are various possibilities of what to do in and return from __enter__. However, given the strong -1 from at least three core developers and one other person, the detail seem moot. -- Terry Jan Reedy -- Terry Jan Reedy

MRAB wrote on Sat, Jan 19, 2013 at 18:32:21 +0000:
OK.
However, given the strong -1 from at least three core developers and one other person, the detail seem moot.
*nod*, I see.
FWIW, -1 from me too because, as has been said already, you shouldn't really be using os.chdir; use absolute paths instead.
I don't use chdir in library code or multithreaded code, but I do use it in __main__ of short scripts, where there's no "caller" or "other thread" to consider. Consider sys.argv. The language and stdlib don't prevent library code from accessing (or modifying) sys.argv, but well-behaved libraries neither read sys.argv nor modify it. The same is true of the cwd.

-1 from me I consider caring about the current directory to be an anti-pattern - paths should be converted to absolute ASAP, and for invocation of other tools that care about the current directory, that's why the subprocess APIs accept a "cwd" argument. I certainly don't want to encourage people to unnecessarily rely on global state by providing a standard library context manager that makes it easier to do so. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

-1 from me, as well. Encouraging a bad habit. On Sat, Jan 19, 2013 at 9:57 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

Christian Heimes wrote on Sat, Jan 19, 2013 at 16:40:32 +0100:
In other words, single-threaded processes will need to implement their own chdir context manager because using it in multi-threaded applications would be a bug. I note the same reasoning applies to a hypothetical context manager that changes the nice(2) level of the current process. This reasoning reduces to "make all of stdlib thread-safe" --- at the expense of people who write single-threaded code, know full well that chdir and nice are global state and should normally not be used, and want to use them anyway. I know enough programming to never call renice or chdir in library code. But when I write some __main__ code, I might want to use them. I should be able to.

Christian Heimes wrote on Sat, Jan 19, 2013 at 16:40:32 +0100:
A couple of other clarifications: Christian clarified on IRC that this refers to the use chdir() (which modifies process-global state) in multithreaded applications. The code uses fchdir so it's not vulnerable to race conditions whereby another process removes the original cwd before it chdir's back to it. (In that respect it's better than the common "try: getcwd() finally: chdir()" pattern.) Someone suggested adding a mutex to the saved_cwd context manager. That would solve the race condition, but I don't have a use-case for it --- precisely because I can't imagine multithreaded code where threads depend their cwd.

On Sat, Jan 19, 2013 at 6:57 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Also it's not thread-safe. TBH I think if people are doing this today it's probably a good idea to suggest that they make their code more reliable by turning it into a context manager; but I think having that context manager in the stdlib is encouraging dubious practices. (The recommendation to use absolute filenames is a good one but not always easy to implement given a large codebase relying on the current directory.) -- --Guido van Rossum (python.org/~guido)

On 1/19/2013 9:57 AM, Nick Coghlan wrote:
Are you suggesting then that stdlib functions, such as archive makers, should 1) not require any particular setting of cwd but should have parameters that allow all paths to passed as absolute paths, and 2) not change cwd? If so, then shutil.make_archive should be able to pass absolute source and target paths to the archive makers, rather than having to set cwd before calling them. -- Terry Jan Reedy

Am 19.01.2013 11:10, schrieb Daniel Shahaf:
-1 from me, too. chdir() is not a safe operation because if affects the whole process. You can NOT make it work properly and safe in a multi-threaded environment or from code like signal handlers. The Open Group has acknowledged the issue and added a new set of functions to POSIX.1-2008 in order to address the issue. The *at() variants of functions like open() take an additional file descriptor as first argument. The fd must refer to a directory and is used as base for relative paths. Python 3.3 supports the new *at() feature. Christian

FWIW, when Windows revised their API set, GetCurrentDirectory and SetCurrentDirectory were completely removed. This seems a pretty strong move away from these APIs. (This only applies to new-style Windows 8 apps; desktop apps can still call them, but the intent is clear.) Cheers, Steve

On Sat, Jan 19, 2013 at 2:10 AM, Daniel Shahaf <d.s@daniel.shahaf.name>wrote:
I don't think that every trivial convenience context manager should be added to the standard library. It's just "yet another thing to look up". As the discussion shows, the semantics of such a context manager are unclear (does it do the change-dir itself or does the user code do it?), which makes it even more important to look-up once you see it. Moreover, this kind of a pattern is too general and specializing it for each use case is burdensome. I've frequently written similar context managers for other uses. The pattern is: saved = save_call() yield restore_call(saved) You can have it for chdir, for sys.path, for seek position in stream, for anything really where it may be useful to do some operation with a temporary state. Eli

On Sat, Jan 19, 2013 at 2:10 AM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
FWIW, test.support has such a context manager (though test.support is not for public consumption and test.support's implementation does more than one thing, though see issue 15415): http://hg.python.org/cpython/file/48cddcb9c841/Lib/test/support.py#l738 --Chris

On 19 January 2013 10:10, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
But if doing that, why does "foo" have to implement the directory changing itself? Why not something along: with temp_dir("/tmp"): # things that perform in /tmp # directory is restored. Of course that one function could do both things, depending ob wether a <dir> parameter is passed. js -><-

Daniel Shahaf <d.s@...> writes:
I implemented this in distlib as: @contextlib.contextmanager def chdir(d): cwd = os.getcwd() try: os.chdir(d) yield finally: os.chdir(cwd) which could perhaps be placed in shutil, so usage would be: with shutil.chdir('new_dir'): # work with new_dir as current dir # directory restored when you get here. Regards, Vinay Sajip

On Sat, Jan 19, 2013 at 01:46:26PM +0000, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Pushd or pushdir would be a better name, IMHO. https://en.wikipedia.org/wiki/Pushd_and_popd Quite a known pair of names. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 1/19/2013 5:10 AM, Daniel Shahaf wrote:
This strikes me as not a proper context manager. A context manager should create a temporary, altered context. One way is to add something that is deleted on exit. File as context managers are the typical example. Another way is to alter something after saving the restore info, and restoring on exit. An example would be a context manager to temporarily change stdout. (Do we have one? If not, it would be at least as generally useful as this proposal.) So to me, your proposal is only 1/2 or 2/3 of a context manager. (And 'returns an open file descriptor for the saved directory' seems backward or wrong for a context manager.) It does not actually make a new context. A proper temp_cwd context manager should have one parameter, the new working directory, with chdir(new_cwd) in the enter method. To allow for conditional switching, the two chdir system calls could be conditional on new_cwd (either None or '' would mean no chdir calls). Looking at your pattern, if foo() does not change cwd, the save and restore is pointless, even if harmless. If foo does change cwd, it should also restore it, whether explicitly or with a context manager temp_cwd. Looking at your actual example, shutil.make_archive, the change and restore are conditional and asymmetrical. save_cwd = os.getcwd() if root_dir is not None: ... if not dry_run: os.chdir(root_dir) ... finally: if root_dir is not None: ... os.chdir(save_cwd) The initial chdir is conditional on dry_run (undocumented, but passed on to the archive function), the restore is not. Since I believe not switching on dry_runs is just a minor optimization, I believe that that condition could be dropped and the code re-written as with new_cwd(root_dir): ... I am aware that this would require a change in the finally logging, but that would be true of the original proposal also. -- Terry Jan Reedy

Terry Reedy wrote on Sat, Jan 19, 2013 at 08:37:17 -0500:
What should __enter__ return, then? It could return None, the to-be-restored directory's file descriptor, or the newly-changed-to directory (once a "directory to chdir to" optional argument is added). The latter could be either a pathname (string) or a file descriptor (since it's just passed through to os.chdir). It seems to me returning the old dir's fd would be the most useful of the three option, since the other two are things callers already have --- None, which is global, and the argument to the context manager.
I think making the new_cwd argument optional would be useful if the context manager body does multiple chdir() calls: with saved_cwd(): os.chdir('/foo') do_something() os.chdir('/bar') do_something() I'm not sure if that's exactly what you suggest --- you seem to be suggesting that saved_cwd(None) will avoid calling fchdir() from __exit__()?
Looking at your pattern, if foo() does not change cwd, the save and restore is pointless, even if harmless.
Do you have a better suggestion? Determining whether the fchdir() call can be avoided, if possible, presumably requires a system call, so I figure you might as well call fchdir() without trying to make that determination.
shutil.make_archive is just the first place in stdlib which uses the pattern, or something close to it. It's not exactly a canonical example. There are some canonical examples of the "pattern" in test_os.py.
-- Terry Jan Reedy
Cheers Daniel

On 1/19/2013 10:06 AM, Daniel Shahaf wrote:
make_archive would prefer the old dir pathname, as it wants that for the logging call. But I do not think that that should drive design.
I was, but that is a non-essential optimization. My idea is basically similar to Bueno's except for parameter absent versus None (and the two cases could be handled differently). I think this proposal suffers a bit from being both too specific and too general. Eli explained the 'too specific' part: there are many things that might be changed and changed back. The 'too general' part is that specific applications need different specific details. There are various possibilities of what to do in and return from __enter__. However, given the strong -1 from at least three core developers and one other person, the detail seem moot. -- Terry Jan Reedy -- Terry Jan Reedy

MRAB wrote on Sat, Jan 19, 2013 at 18:32:21 +0000:
OK.
However, given the strong -1 from at least three core developers and one other person, the detail seem moot.
*nod*, I see.
FWIW, -1 from me too because, as has been said already, you shouldn't really be using os.chdir; use absolute paths instead.
I don't use chdir in library code or multithreaded code, but I do use it in __main__ of short scripts, where there's no "caller" or "other thread" to consider. Consider sys.argv. The language and stdlib don't prevent library code from accessing (or modifying) sys.argv, but well-behaved libraries neither read sys.argv nor modify it. The same is true of the cwd.

-1 from me I consider caring about the current directory to be an anti-pattern - paths should be converted to absolute ASAP, and for invocation of other tools that care about the current directory, that's why the subprocess APIs accept a "cwd" argument. I certainly don't want to encourage people to unnecessarily rely on global state by providing a standard library context manager that makes it easier to do so. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

-1 from me, as well. Encouraging a bad habit. On Sat, Jan 19, 2013 at 9:57 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

Christian Heimes wrote on Sat, Jan 19, 2013 at 16:40:32 +0100:
In other words, single-threaded processes will need to implement their own chdir context manager because using it in multi-threaded applications would be a bug. I note the same reasoning applies to a hypothetical context manager that changes the nice(2) level of the current process. This reasoning reduces to "make all of stdlib thread-safe" --- at the expense of people who write single-threaded code, know full well that chdir and nice are global state and should normally not be used, and want to use them anyway. I know enough programming to never call renice or chdir in library code. But when I write some __main__ code, I might want to use them. I should be able to.

Christian Heimes wrote on Sat, Jan 19, 2013 at 16:40:32 +0100:
A couple of other clarifications: Christian clarified on IRC that this refers to the use chdir() (which modifies process-global state) in multithreaded applications. The code uses fchdir so it's not vulnerable to race conditions whereby another process removes the original cwd before it chdir's back to it. (In that respect it's better than the common "try: getcwd() finally: chdir()" pattern.) Someone suggested adding a mutex to the saved_cwd context manager. That would solve the race condition, but I don't have a use-case for it --- precisely because I can't imagine multithreaded code where threads depend their cwd.

On Sat, Jan 19, 2013 at 6:57 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Also it's not thread-safe. TBH I think if people are doing this today it's probably a good idea to suggest that they make their code more reliable by turning it into a context manager; but I think having that context manager in the stdlib is encouraging dubious practices. (The recommendation to use absolute filenames is a good one but not always easy to implement given a large codebase relying on the current directory.) -- --Guido van Rossum (python.org/~guido)

On 1/19/2013 9:57 AM, Nick Coghlan wrote:
Are you suggesting then that stdlib functions, such as archive makers, should 1) not require any particular setting of cwd but should have parameters that allow all paths to passed as absolute paths, and 2) not change cwd? If so, then shutil.make_archive should be able to pass absolute source and target paths to the archive makers, rather than having to set cwd before calling them. -- Terry Jan Reedy

Am 19.01.2013 11:10, schrieb Daniel Shahaf:
-1 from me, too. chdir() is not a safe operation because if affects the whole process. You can NOT make it work properly and safe in a multi-threaded environment or from code like signal handlers. The Open Group has acknowledged the issue and added a new set of functions to POSIX.1-2008 in order to address the issue. The *at() variants of functions like open() take an additional file descriptor as first argument. The fd must refer to a directory and is used as base for relative paths. Python 3.3 supports the new *at() feature. Christian

FWIW, when Windows revised their API set, GetCurrentDirectory and SetCurrentDirectory were completely removed. This seems a pretty strong move away from these APIs. (This only applies to new-style Windows 8 apps; desktop apps can still call them, but the intent is clear.) Cheers, Steve

On Sat, Jan 19, 2013 at 2:10 AM, Daniel Shahaf <d.s@daniel.shahaf.name>wrote:
I don't think that every trivial convenience context manager should be added to the standard library. It's just "yet another thing to look up". As the discussion shows, the semantics of such a context manager are unclear (does it do the change-dir itself or does the user code do it?), which makes it even more important to look-up once you see it. Moreover, this kind of a pattern is too general and specializing it for each use case is burdensome. I've frequently written similar context managers for other uses. The pattern is: saved = save_call() yield restore_call(saved) You can have it for chdir, for sys.path, for seek position in stream, for anything really where it may be useful to do some operation with a temporary state. Eli

On Sat, Jan 19, 2013 at 2:10 AM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
FWIW, test.support has such a context manager (though test.support is not for public consumption and test.support's implementation does more than one thing, though see issue 15415): http://hg.python.org/cpython/file/48cddcb9c841/Lib/test/support.py#l738 --Chris
participants (16)
-
Barry Warsaw
-
Calvin Spealman
-
Chris Jerdonek
-
Christian Heimes
-
Daniel Shahaf
-
Eli Bendersky
-
Guido van Rossum
-
Joao S. O. Bueno
-
Laurens Van Houtven
-
MRAB
-
Nick Coghlan
-
Oleg Broytman
-
Paul Moore
-
Steve Dower
-
Terry Reedy
-
Vinay Sajip