I'm not subscribed to python-dev. Please cc: me directly on replies. I'm going to respond to all the comments at once. Greg Ward wrote:
Attached please find a rewritten and improved tmpfile.py. The major change is to make the temporary file names significantly harder to predict. This foils denial-of-service attacks, where a hostile program floods /tmp with files named @12345.NNNN to prevent process 12345 from creating any temp files. It also makes the race condition inherent in tmpfile.mktemp() somewhat harder to exploit.
Oh, good! I've long wished that there was a tmpfile module written by someone who understands the security issues involved in generating temporary filenames and files. I hope you do... ;-)
Well, I wrote the analogous code in the GNU C library (using basically the same algorithm). I'm confident it is safe on a Unix-based system. On Windows and others, I am relying on os.open(..., os.O_EXCL) to do what it claims to do; assuming it does, the code should be safe there too.
(fd, name) = mkstemp(suffix="", binary=1): Creates a temporary file, returning both an OS-level file descriptor open on it and its name. This is useful in situations where you need to know the name of the temporary file, but can't risk the race in mktemp.
+1 except for the name. What does the "s" stand for? Unfortunately, I can't think of a more descriptive name offhand.
Fredrik Lundh's suggestion that it is for "safer" seems plausible, but I do not actually know. I chose the names mkstemp and mkdtemp to match the functions of the same name in most modern Unix C libraries. Since they don't take the same "template" parameter that those functions do, that was probably a bad idea. [Note to Fredrik: at the C level, mkstemp is not deprecated in favor of tmpfile, as they do very different things - tmpfile(3) is analogous to tmpfile.TemporaryFile(), you don't get the file name back.] I'm open to suggestions for a better routine name; I can't think of a good one myself.
name = mkdtemp(suffix=""): Creates a temporary directory, without race.
How about calling this one mktempdir() ?
Sure.
I've scanned your code and the existing tempfile.py. I don't understand why you rearranged things. Please explain why your arrangement of _TemporaryFileWrapper/TemporaryFile/ NamedTemporaryFile is better than what we have.
I was trying to get all the user-accessible interfaces to be at the top of the file. Also, I do not understand the bits in the existing file that delete names out of the module namespace after we're done with them, so I wound up taking all of that out to get it to work. I think the existing file's organization was largely determined by those 'del' statements. I'm happy to organize the file any way y'all like -- I'm kind of new to Python and I don't know the conventions yet.
A few minor comments on the code...
if os.name == 'nt': _template = '~%s~' elif os.name in ('mac', 'riscos'): _template = 'Python-Tmp-%s' else: _template = 'pyt%s' # better ideas?
Why reveal the implementation language of the application creating these temporary names? More importantly, why do it certain platforms, but not others?
This is largely as it was in the old file. I happen to know that ~%s~ is conventional for temporary files on Windows. I changed 'tmp%s' to 'pyt%s' for Unix to make it consistent with Mac/RiscOS Ideally one would allow the calling application to control the prefix, but I'm not sure what the right interface is. Maybe tmpfile.mkstemp(prefix="", suffix="") where if one argument is provided it gets treated as the suffix, but if two are provided the prefix comes first, a la range()? Is there a way to express that in the prototype?
### Recommended, user-visible interfaces.
_text_openflags = os.O_RDWR | os.O_CREAT | os.O_EXCL if os.name == 'posix': _bin_openflags = os.O_RDWR | os.O_CREAT | os.O_EXCL
Why not just "_bin_openflags = _text_openflags" ? That clarifies their equality on Unix.
else: _bin_openflags = os.O_RDWR | os.O_CREAT | os.O_EXCL | os.O_BINARY
Why not "_bin_openflags = _text_openflags | os.O_BINARY" ?
*shrug* Okay.
def mkstemp(suffix="", binary=1): """Function to create a named temporary file, with 'suffix' for its suffix. Returns an OS-level handle to the file and the name, as a tuple. If 'binary' is 1, the file is opened in binary mode, otherwise text mode (if this is a meaningful concept for the operating system in use). In any case, the file is readable and writable only by the creating user, and executable by no one."""
"Function to" is redundant.
I didn't change much of this text from the old file. Where are docstring conventions documented?
"""Create a named temporary file.
Create a named temporary file with 'suffix' for its suffix. Return a tuple (fd, name) where 'fd' is an OS-level handle to the file, and 'name' is the complete path to the file. If 'binary' is true, the file is opened in binary mode, otherwise text mode (if this is a meaningful concept for the operating system in use). In any case, the file is readable and writable only by the creating user, and executable by no one (on platforms where that makes sense). """
Okay.
Hmmm: if suffix == ".bat", the file is executable on some platforms. That last sentence still needs work.
... In any case, the file is readable and writable only by the creating user. On platforms where the file's permission bits control whether it can be executed as a program, no one can. Other platforms have other ways of controlling this: for instance, under Windows, the suffix determines whether the file can be executed. How's that?
class _TemporaryFileWrapper: """Temporary file wrapper
This class provides a wrapper around files opened for temporary use. In particular, it seeks to automatically remove the file when it is no longer needed. """
Here's where I started getting confused. I don't dispute that the existing code could stand some rearrangement, but I don't understand why you did it the way you did. Please clarify!
See above. What would you consider a sensible arrangement?
### Deprecated, user-visible interfaces.
def mktemp(suffix=""): """User-callable function to return a unique temporary file name.""" while 1: name = _candidate_name(suffix) if not os.path.exists(name): return name
The docstring for mktemp() should state *why* it's bad to use this function -- otherwise people will say, "oh, this looks like it does what I need" and use it in ignorance. So should the library reference manual.
Good point. """Suggest a name to be used for a temporary file. This function returns a file name, with 'suffix' for its suffix, which did not correspond to any file at some point in the past. By the time you get the return value of this function, a file may have already been created with that name. It is therefore unsafe to use this function for any purpose. It is deprecated and may be removed in a future version of Python.""" and corresponding text in the library manual? Tim Peters wrote:
-1 on the implementation here, because it didn't start with current CVS, so is missing important work that went into improving this module on Windows for 2.3. Whether spawned/forked processes inherit descriptors for "temp files" is also a security issue that's addressed in current CVS but seemed to have gotten dropped on the floor here.
I'll get my hands on a copy of current CVS and rework my changes against that.
A note on UI: for many programmers, "it's a feature" that temp file names contain the pid. I don't think we can get away with taking that away no matter how stridently someone claims it's bad for us <wink>.
GNU libc took that away from C programmers about four years ago and no one even noticed. FreeBSD libc, ditto, although I'm not sure when it happened. zw
zack wrote:
[Note to Fredrik: at the C level, mkstemp is not deprecated in favor of tmpfile, as they do very different things - tmpfile(3) is analogous to tmpfile.TemporaryFile(), you don't get the file name back.]
I quoted the SUSv2 spec from memory. shouldn't have done that: it says "preferred for portability reasons", not deprecated. </F>
On Thu, Jun 27, 2002, Zack Weinberg wrote:
This is largely as it was in the old file. I happen to know that ~%s~ is conventional for temporary files on Windows. I changed 'tmp%s' to 'pyt%s' for Unix to make it consistent with Mac/RiscOS
Ideally one would allow the calling application to control the prefix, but I'm not sure what the right interface is. Maybe
tmpfile.mkstemp(prefix="", suffix="")
where if one argument is provided it gets treated as the suffix, but if two are provided the prefix comes first, a la range()? Is there a way to express that in the prototype?
The main problem with this is that range() doesn't support keyword arguments, just positional ones. In order to get the same effect with mkstemp, you'd have to do def tmpfile.mkstemp(*args): and raise an exception with more than two arguments. Otherwise, if you allow keyword arguments, you get the possibility of: tmpfile.mkstemp(prefix="foo") and you can't distinguish that from tmpfile.mkstemp("foo") unless you change the prototype to def tmp.mkstemp(*args, **kwargs): which requires a bit more of a song-and-dance setup routine. In any event, you probably should not use empty strings as the default parameters; use None instead. (Yeah, this is getting a bit off-topic for python-dev; I'm just practicing for my book. ;-) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
On 27 June 2002, Zack Weinberg said:
Well, I wrote the analogous code in the GNU C library (using basically the same algorithm). I'm confident it is safe on a Unix-based system. On Windows and others, I am relying on os.open(..., os.O_EXCL) to do what it claims to do; assuming it does, the code should be safe there too.
Sounds like good credentials to me. Welcome to Python-land! Note that you'll probably get more positive feedback if you provide a patch to tmpfile.py rather than a complete rewrite. And patches will get lost on python-dev -- you should submit it to SourceForge, and stay on the case until the patch is accepted or rejected (or maybe deferred). [me]
+1 except for the name. What does the "s" stand for? Unfortunately, I can't think of a more descriptive name offhand.
[Zack]
Fredrik Lundh's suggestion that it is for "safer" seems plausible, but I do not actually know. I chose the names mkstemp and mkdtemp to match the functions of the same name in most modern Unix C libraries. Since they don't take the same "template" parameter that those functions do, that was probably a bad idea.
Hmmmm... I'm torn here. When emulating (or wrapping) functionality from the standard C library or Unix kernel, I think it's generally good to preserve familiar, long-used names: os.chmod() is better than os.changemode() (or change_mode(), if I wrote the code). But mkstemp() and mkdtemp() are *not* familiar, long-used names. (At least not to me -- I program in C very rarely!) But they will probably become more familiar over time. Also, API changes that are just due to fundamental differences between C and Python (immutable strings, multiple return values) are not really enough reason to change a name. It looks like your Python mkstemp() has one big advantage over the glibc mkstemp() -- you can supply a suffix. IMHO, the inability to supply a prefix is a small disadvantage. But those add up to a noticeably different API. I think I'm slightly in favour of a different name for the Python version. If you make it act like this: mkstemp(template : string = (sensible default), suffix : string = "") -> (filename : string, fd : int) (err, I hope my personal type language is comprehensible), then call it mkstemp() after all.
[Note to Fredrik: at the C level, mkstemp is not deprecated in favor of tmpfile, as they do very different things - tmpfile(3) is analogous to tmpfile.TemporaryFile(), you don't get the file name back.]
But the man page for mkstemp() in glibc 2.2.5 (Debian unstable) says: Don't use this function, use tmpfile(3) instead. It is better defined and more portable. BTW, that man page has two "NOTES" sections.
I was trying to get all the user-accessible interfaces to be at the top of the file. Also, I do not understand the bits in the existing file that delete names out of the module namespace after we're done with them, so I wound up taking all of that out to get it to work. I think the existing file's organization was largely determined by those 'del' statements.
I'm happy to organize the file any way y'all like -- I'm kind of new to Python and I don't know the conventions yet.
If I was starting from scratch, I would *probably* do something like this: if os.name == "posix": class TemporaryFile: [... define Unix version of TemporaryFile ...] elif os.name == "nt": class NamedTemporaryFile: [...] class TemporaryFile: [... on top of NamedTemporaryFile ...] elif os.name == "macos": # beats me But I don't know the full history of this module. IMHO you would have a much better chance of success if you prepared a couple of patches -- eg. one to add mkstemp() and mkdtemp() (possibly with different names), another to do whatever it is to TemporaryFile that you want to do. Possibly a third for general code cleanup, if you feel some is needed. (Or do the code cleanup patch first, if that's what's needed.) Greg -- Greg Ward - just another Python hacker gward@python.net http://starship.python.net/~gward/ I hope something GOOD came in the mail today so I have a REASON to live!!
participants (4)
-
Aahz
-
Fredrik Lundh
-
Greg Ward
-
Zack Weinberg