Warn about mktemp once again?

Back in r29829, Guido commented out the security hole warning for tempfile.mktemp: r29829 | gvanrossum | 2002-11-22 09:56:29 -0600 (Fri, 22 Nov 2002) | 3 lines Comment out the warnings about mktemp(). These are too annoying, and often unavoidable. Any thought about whether this warning should be restored? We're 5+ years later. Hopefully many uses of mktemp have been removed. If we're not going to restore the warning perhaps the commented code should just be deleted. Skip

Have we documented the alternatives well enough? In most cases NamedTemporaryFile will work, but sometimes you will have to create a directory and pick names therein. Doing that so that it will always be cleaned up properly is a bit of a trick, involving an isdir() check and a shutil.rmtree() call. On Sun, Apr 27, 2008 at 3:33 PM, <skip@pobox.com> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido> Have we documented the alternatives well enough? I suppose we could document explicitly how to use mkstemp() in place of mktemp(), but the difference in return value is fairly modest: >>> tempfile.mktemp() '/var/folders/5q/5qTPn6xq2RaWqk+1Ytw3-U+++TI/-Tmp-/tmpV_5OLi' >>> tempfile.mkstemp() (3, '/var/folders/5q/5qTPn6xq2RaWqk+1Ytw3-U+++TI/-Tmp-/tmpmS7K4T') and the argument list is quite similar as well: >>> help(tempfile.mktemp) Help on function mktemp in module tempfile: mktemp(suffix='', prefix='tmp', dir=None) ... >>> help(tempfile.mkstemp) Help on function mkstemp in module tempfile: mkstemp(suffix='', prefix='tmp', dir=None, text=False) ... Guido> In most cases NamedTemporaryFile will work, ... It's API seems to be a bit farther from the mktemp API than that of mkstemp. Skip

IMO mkstemp() is a major pain because you have to use raw file descriptors on the return value. I'd much rather recommend [Named]TemporaryFile which return streams. On Mon, Apr 28, 2008 at 4:17 PM, <skip@pobox.com> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido> IMO mkstemp() is a major pain because you have to use raw file Guido> descriptors on the return value. I'd much rather recommend Guido> [Named]TemporaryFile which return streams. Why not: fd, fname = tempfile.mkstemp() f = os.fdopen(fd) Seems fairly straightforward to me. Skip

I'd be much happier if there was a standard API in tempfile.py that did that for you, so you wouldn't have to understand fdopen(). (Your example is wrong BTW, the open mode would have to be something like 'rb+' or 'wb+'.) On Mon, Apr 28, 2008 at 4:35 PM, <skip@pobox.com> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido> I'd be much happier if there was a standard API in tempfile.py Guido> that did that for you, so you wouldn't have to understand Guido> fdopen(). That can be arranged: http://bugs.python.org/issue2717 Guido> (Your example is wrong BTW, the open mode would have to be Guido> something like 'rb+' or 'wb+'.) I'm not disagreeing with you, but it seems odd that os.fdopen doesn't simply obey the mode of the file descriptor it receives as its argument (e.g., normally opened with "w+b"). Skip

skip@pobox.com wrote:
I'm not disagreeing with you, but it seems odd that os.fdopen doesn't simply obey the mode of the file descriptor it receives as its argument
I'm not even sure if it's possible to find out the mode that a file descriptor was opened with -- is it? Certainly there's no way to tell whether you want it treated as binary, so at least that much of the mode needs to be specified. -- Greg

Greg Ewing wrote:
That much has been fixed in 2.6 - you can now just pass "delete=False" to the NamedTemporaryFile constructor to stop it doing that. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

On Mon, Apr 28, 2008 at 10:49 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Why? You can flush it and then all the data is on the disk. The whole point of [Named]TemporaryFile is to automate the cleanup as well as the creation. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Why? You can flush it and then all the data is on the disk.
That might be all right on Unix, but I would be worried that having the file open could prevent some other things being done with it on some platforms, such as renaming. You might also want to pass the file name on to some other process.
The whole point of [Named]TemporaryFile is to automate the cleanup as well as the creation.
The docs (at least up to 2.5) don't make it at all clear that this is the *whole* point of *both* these functions. The doc for NamedTemporaryFile seems to disavow any guarantee that you will be able to do anything with the name while the file is open. If you can't use the name while the file is open, and the file ceases to exist when it's closed, then what use is it to have the name? The obvious conclusion is that the point of NamedTemporaryFile is to give you a file that *doesn't* go away when you close it, and has a name so you can go on to do other things with it. So I plead ignorance due to misleading documentation. -- Greg

On Tue, 29 Apr 2008 17:15:11 +1200, Greg Ewing wrote:
Same here. In fact, is there a good reason to have mkstemp() return the fd (except backward compatibility)? -- Giovanni Bajo Develer S.r.l. http://www.develer.com

Same here. In fact, is there a good reason to have mkstemp() return the fd (except backward compatibility)?
Except for backwards compatibility: is there a good reason to keep os.mkstemp at all? The tradition is that all APIs in os expose the "system" calls as-is, i.e. they don't try to adjust the result values at all. This has the advantage that the caller can use their system's man page to find out all details, and trust that the Python wrapper does exactly the same. For mkstemp, there are two special issues: the C API modifies the string parameter, which cannot be exposed to Python as-is (hence the two result values), and it isn't a system call (at least not on POSIX), so it doesn't really need to be exposed, except perhaps to also provide this on non-POSIX systems where a regular POSIX implementation (in Python) would fail. Regards, Martin

2008/4/29 "Martin v. Löwis" <martin@v.loewis.de>:
Greg Ewing's use-case is one I've also had at times - ie. as a convenience function for creating a "somewhat temporary" file that is randomly named, but persists beyond the closing of the file. If the function doesn't stay in os it doesn't make any difference to me though :-)

David Harrison wrote:
As of 2.6, Greg's use case is addressed by the new 'delete' parameter on tempfile.NamedTemporaryFile. The implementation of the tempfile module uses os.mkstemp() though, so getting rid of the latter might cause a few problems :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

On 4/29/2008 2:18 PM, Nick Coghlan wrote:
Then I personally don't have any objection to the removal of os.mkstemp. Since we're at it, a common pattern I use is to create temporary file to atomically replace files: I create a named temporary file in the same directory of the file I want to replace; write data into it; close it; and move it (POSIX "move": rename with silent overwrite) to the destination file. AFAIK, this is allows an atomic file replacemente on most filesystems. I believe this is a common useful pattern that could be handled in module tmpfile (especially since the final "rename" step requires a little care to be truly multiplatform). -- Giovanni Bajo Develer S.r.l. http://www.develer.com

On Tue, 2008-04-29 at 15:34 +0200, Giovanni Bajo wrote:
os.mkstemp is still useful when you *don't* need the file object, but the actual file descriptor, for passing to C code or to the child processes, or to mmap.mmap and such. It is also familiar to the Unix/C hackers, and it should cost very little to keep it around.
I agree, having a tempfile class with rename-on-close semantics would be very useful.

Greg> I'd like to see a variation of mkstemp() that returns a file Greg> object instead of a file descriptor... http://bugs.python.org/issue2717 Comments welcome. Skip

<skip <at> pobox.com> writes:
Back in r29829, Guido commented out the security hole warning for tempfile.mktemp:
[...]
Sorry to revive this thread, but mktemp() is very useful when the file is meant to be created by another application (e.g. launched by subprocess, but it could even be a daemon running under a different user). For example if I have a processing chain to converts a PDF to a temporary JPEG using an external tool and then does other things with the JPEG: I don't want Python to actually create the file, just to generate an unique filename. Of course one can use NamedTemporaryFile, retrieve the name, close the file handle and then pass the name to the other application. But it's an useless complication compared to the simplicity of writing "my_filename = tempfile.mktemp()"

On Tue, May 6, 2008 at 4:19 AM, Tristan Seligmann <mithrandi-python-dev@mithrandi.za.net> wrote:
Good catch. The problem with mktemp() is exactly its convenience, which opens it up for well-documented symlink attacks. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Tristan Seligmann wrote:
The correct way to do this is to create a temporary directory, and then generate a filename underneath that directory to use.
There is a platform difference here. On unix mktemp will usually provide a file name in a world-writeable directory (/tmp/) which is wide open to race condition attacks leading to privilege escalation. On win32 it will usually (but not always) provide a file name in a directory writeable only by the current user. The temporary directory step sometimes seems unnecessary to windows developers.

Are you (or are you not) aware that this strategy allows for malicious code to provide you with a fake JPEG file? If so, does it not concern you? As others have said: the reason the function is deprecated is that it is easy to run into security problems while using it, and users are often unaware of the security implications. Of course, for many applications, there is no real threat, and any risk might be acceptable. Unfortunately, if a security catastrophe results from the function, blame is (correctly) also upon Python for providing the function in the first place. Regards, Martin

Martin v. Löwis <martin <at> v.loewis.de> writes:
Yes, I'm aware of it and no, it doesn't concern me. When you write a processing script for internal use there's no reason to worry about security issues like that (if someone manages to enter your private machine he can do much more dangereous things anyway).
As others have said: the reason the function is deprecated is that it is easy to run into security problems while using it,
I know, I've read the doc :) I'm sure there are lots of other insecure things in Python's stdlib (e.g. HTTP Basic Authentication, or plaintext authentication in email clients, or algorithms known to be weak in hashlib) and they haven't been deprecated. Of course some of those may be necessary for interoperability, while mktemp() isn't. In any case what I wanted to point out is that the secure replacement for a one-liner mktemp() is not as concise and obvious as the insecure version, and that sometimes there are good reasons to not even care about the "security problems". Regards Antoine.

This has been a fairly interesting discussion about tempfile.mktemp, but can we return to the original question? Should we begin warning about its use again? Should we ever warn about it? It appears that NamedTemporaryFile gives you a reasonable (secure) replacement. I'm happy to document mktemp->NamedTemporaryFile with an example in tempfile.rst if that's what it will take. Before knowing about the delete=False parameter I proposed a mkstempf() function, whose tracker issue I will close if that hasn't already happened. Skip

Have we documented the alternatives well enough? In most cases NamedTemporaryFile will work, but sometimes you will have to create a directory and pick names therein. Doing that so that it will always be cleaned up properly is a bit of a trick, involving an isdir() check and a shutil.rmtree() call. On Sun, Apr 27, 2008 at 3:33 PM, <skip@pobox.com> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido> Have we documented the alternatives well enough? I suppose we could document explicitly how to use mkstemp() in place of mktemp(), but the difference in return value is fairly modest: >>> tempfile.mktemp() '/var/folders/5q/5qTPn6xq2RaWqk+1Ytw3-U+++TI/-Tmp-/tmpV_5OLi' >>> tempfile.mkstemp() (3, '/var/folders/5q/5qTPn6xq2RaWqk+1Ytw3-U+++TI/-Tmp-/tmpmS7K4T') and the argument list is quite similar as well: >>> help(tempfile.mktemp) Help on function mktemp in module tempfile: mktemp(suffix='', prefix='tmp', dir=None) ... >>> help(tempfile.mkstemp) Help on function mkstemp in module tempfile: mkstemp(suffix='', prefix='tmp', dir=None, text=False) ... Guido> In most cases NamedTemporaryFile will work, ... It's API seems to be a bit farther from the mktemp API than that of mkstemp. Skip

IMO mkstemp() is a major pain because you have to use raw file descriptors on the return value. I'd much rather recommend [Named]TemporaryFile which return streams. On Mon, Apr 28, 2008 at 4:17 PM, <skip@pobox.com> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido> IMO mkstemp() is a major pain because you have to use raw file Guido> descriptors on the return value. I'd much rather recommend Guido> [Named]TemporaryFile which return streams. Why not: fd, fname = tempfile.mkstemp() f = os.fdopen(fd) Seems fairly straightforward to me. Skip

I'd be much happier if there was a standard API in tempfile.py that did that for you, so you wouldn't have to understand fdopen(). (Your example is wrong BTW, the open mode would have to be something like 'rb+' or 'wb+'.) On Mon, Apr 28, 2008 at 4:35 PM, <skip@pobox.com> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido> I'd be much happier if there was a standard API in tempfile.py Guido> that did that for you, so you wouldn't have to understand Guido> fdopen(). That can be arranged: http://bugs.python.org/issue2717 Guido> (Your example is wrong BTW, the open mode would have to be Guido> something like 'rb+' or 'wb+'.) I'm not disagreeing with you, but it seems odd that os.fdopen doesn't simply obey the mode of the file descriptor it receives as its argument (e.g., normally opened with "w+b"). Skip

skip@pobox.com wrote:
I'm not disagreeing with you, but it seems odd that os.fdopen doesn't simply obey the mode of the file descriptor it receives as its argument
I'm not even sure if it's possible to find out the mode that a file descriptor was opened with -- is it? Certainly there's no way to tell whether you want it treated as binary, so at least that much of the mode needs to be specified. -- Greg

Greg Ewing wrote:
That much has been fixed in 2.6 - you can now just pass "delete=False" to the NamedTemporaryFile constructor to stop it doing that. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

On Mon, Apr 28, 2008 at 10:49 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Why? You can flush it and then all the data is on the disk. The whole point of [Named]TemporaryFile is to automate the cleanup as well as the creation. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Why? You can flush it and then all the data is on the disk.
That might be all right on Unix, but I would be worried that having the file open could prevent some other things being done with it on some platforms, such as renaming. You might also want to pass the file name on to some other process.
The whole point of [Named]TemporaryFile is to automate the cleanup as well as the creation.
The docs (at least up to 2.5) don't make it at all clear that this is the *whole* point of *both* these functions. The doc for NamedTemporaryFile seems to disavow any guarantee that you will be able to do anything with the name while the file is open. If you can't use the name while the file is open, and the file ceases to exist when it's closed, then what use is it to have the name? The obvious conclusion is that the point of NamedTemporaryFile is to give you a file that *doesn't* go away when you close it, and has a name so you can go on to do other things with it. So I plead ignorance due to misleading documentation. -- Greg

On Tue, 29 Apr 2008 17:15:11 +1200, Greg Ewing wrote:
Same here. In fact, is there a good reason to have mkstemp() return the fd (except backward compatibility)? -- Giovanni Bajo Develer S.r.l. http://www.develer.com

Same here. In fact, is there a good reason to have mkstemp() return the fd (except backward compatibility)?
Except for backwards compatibility: is there a good reason to keep os.mkstemp at all? The tradition is that all APIs in os expose the "system" calls as-is, i.e. they don't try to adjust the result values at all. This has the advantage that the caller can use their system's man page to find out all details, and trust that the Python wrapper does exactly the same. For mkstemp, there are two special issues: the C API modifies the string parameter, which cannot be exposed to Python as-is (hence the two result values), and it isn't a system call (at least not on POSIX), so it doesn't really need to be exposed, except perhaps to also provide this on non-POSIX systems where a regular POSIX implementation (in Python) would fail. Regards, Martin

2008/4/29 "Martin v. Löwis" <martin@v.loewis.de>:
Greg Ewing's use-case is one I've also had at times - ie. as a convenience function for creating a "somewhat temporary" file that is randomly named, but persists beyond the closing of the file. If the function doesn't stay in os it doesn't make any difference to me though :-)

David Harrison wrote:
As of 2.6, Greg's use case is addressed by the new 'delete' parameter on tempfile.NamedTemporaryFile. The implementation of the tempfile module uses os.mkstemp() though, so getting rid of the latter might cause a few problems :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

On 4/29/2008 2:18 PM, Nick Coghlan wrote:
Then I personally don't have any objection to the removal of os.mkstemp. Since we're at it, a common pattern I use is to create temporary file to atomically replace files: I create a named temporary file in the same directory of the file I want to replace; write data into it; close it; and move it (POSIX "move": rename with silent overwrite) to the destination file. AFAIK, this is allows an atomic file replacemente on most filesystems. I believe this is a common useful pattern that could be handled in module tmpfile (especially since the final "rename" step requires a little care to be truly multiplatform). -- Giovanni Bajo Develer S.r.l. http://www.develer.com

On Tue, 2008-04-29 at 15:34 +0200, Giovanni Bajo wrote:
os.mkstemp is still useful when you *don't* need the file object, but the actual file descriptor, for passing to C code or to the child processes, or to mmap.mmap and such. It is also familiar to the Unix/C hackers, and it should cost very little to keep it around.
I agree, having a tempfile class with rename-on-close semantics would be very useful.

Greg> I'd like to see a variation of mkstemp() that returns a file Greg> object instead of a file descriptor... http://bugs.python.org/issue2717 Comments welcome. Skip

<skip <at> pobox.com> writes:
Back in r29829, Guido commented out the security hole warning for tempfile.mktemp:
[...]
Sorry to revive this thread, but mktemp() is very useful when the file is meant to be created by another application (e.g. launched by subprocess, but it could even be a daemon running under a different user). For example if I have a processing chain to converts a PDF to a temporary JPEG using an external tool and then does other things with the JPEG: I don't want Python to actually create the file, just to generate an unique filename. Of course one can use NamedTemporaryFile, retrieve the name, close the file handle and then pass the name to the other application. But it's an useless complication compared to the simplicity of writing "my_filename = tempfile.mktemp()"

On Tue, May 6, 2008 at 4:19 AM, Tristan Seligmann <mithrandi-python-dev@mithrandi.za.net> wrote:
Good catch. The problem with mktemp() is exactly its convenience, which opens it up for well-documented symlink attacks. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Tristan Seligmann wrote:
The correct way to do this is to create a temporary directory, and then generate a filename underneath that directory to use.
There is a platform difference here. On unix mktemp will usually provide a file name in a world-writeable directory (/tmp/) which is wide open to race condition attacks leading to privilege escalation. On win32 it will usually (but not always) provide a file name in a directory writeable only by the current user. The temporary directory step sometimes seems unnecessary to windows developers.

Are you (or are you not) aware that this strategy allows for malicious code to provide you with a fake JPEG file? If so, does it not concern you? As others have said: the reason the function is deprecated is that it is easy to run into security problems while using it, and users are often unaware of the security implications. Of course, for many applications, there is no real threat, and any risk might be acceptable. Unfortunately, if a security catastrophe results from the function, blame is (correctly) also upon Python for providing the function in the first place. Regards, Martin

Martin v. Löwis <martin <at> v.loewis.de> writes:
Yes, I'm aware of it and no, it doesn't concern me. When you write a processing script for internal use there's no reason to worry about security issues like that (if someone manages to enter your private machine he can do much more dangereous things anyway).
As others have said: the reason the function is deprecated is that it is easy to run into security problems while using it,
I know, I've read the doc :) I'm sure there are lots of other insecure things in Python's stdlib (e.g. HTTP Basic Authentication, or plaintext authentication in email clients, or algorithms known to be weak in hashlib) and they haven't been deprecated. Of course some of those may be necessary for interoperability, while mktemp() isn't. In any case what I wanted to point out is that the secure replacement for a one-liner mktemp() is not as concise and obvious as the insecure version, and that sometimes there are good reasons to not even care about the "security problems". Regards Antoine.

This has been a fairly interesting discussion about tempfile.mktemp, but can we return to the original question? Should we begin warning about its use again? Should we ever warn about it? It appears that NamedTemporaryFile gives you a reasonable (secure) replacement. I'm happy to document mktemp->NamedTemporaryFile with an example in tempfile.rst if that's what it will take. Before knowing about the delete=False parameter I proposed a mkstempf() function, whose tracker issue I will close if that hasn't already happened. Skip
participants (13)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Bill Janssen
-
David Harrison
-
Giovanni Bajo
-
Greg Ewing
-
Guido van Rossum
-
Hrvoje Nikšić
-
Nick Coghlan
-
Scott Dial
-
skip@pobox.com
-
Toby Dickenson
-
Tristan Seligmann