[Tutor] file exists question

Steven D'Aprano steve at pearwood.info
Tue Mar 10 01:22:57 CET 2015


On Mon, Mar 09, 2015 at 04:50:11PM +0000, Alan Gauld wrote:
> Somebody posted a question asking how to fond out if a file
> exists. The message was in the queue and I thought I'd approved
> it but it hasn't shown up yet. Sorry to the OP if I've messed up.
> 
> The answer is that you use the os.path.exists() function.
> It takes a path as an argument which can be relative to
> the cwd or absolute.

os.path.exists is a little bit of an anti-pattern though. It has two 
problems: there is a race condition here, waiting to bite you. Just 
because the file exists now, doesn't mean it will exist a millisecond 
later when you try to open it. Also, even if the file exists, there is 
no guarantee you can open it.


Code like this is buggy:


filename = raw_input("What file do you want to open? ")
if os.path.exists(filename):
    with open(filename) as f:
        text = f.read()
    process(text)


Nearly all computers these days, and all of the common 
Windows/Mac/Linux/Unix systems, are both multi-processing and 
multi-user. That means the above code can fail two different ways:

* In a multi-processing system, another process can delete the 
  file just after os.path.exists() returns True. It *did* exist, 
  but a millisecond later when you try to open it, it is gone.

* In a multi-user system, files can be owned by different users,
  and you might not have permission to open files belonging to
  another user. (Technically, you might not even have permission
  to open files owned by yourself, but that's pretty rare.)


So the above is usually written like this:


filename = raw_input("What file do you want to open? ")
try:
    with open(filename) as f:
        text = f.read()
except (IOError, OSError):
    pass
else:
    process(text)


We just try to open the file and read from it. If it succeeds, the 
"else" block runs. (Yes, try...except...else works!). If it fails, 
because the file doesn't exist or permission is denied, then we just 
skip the processing step.

The only use I have found for os.path.exists is to try to *avoid* an 
existing file. (Even here, it is still subject to one of the above 
problems: just because a file *doesn't* exist now, a millisecond it 
might.) For example, automatically numbering files rather than 
overwriting them:


filename = raw_input("Save file as...? ")
name, ext = os.path.splitext(filename)
n = 1
while os.path.exists(filename):
    # automatically pick a new name
    filename = "%s~%d" % (name, n)
    n += 1
try:
    with open(filename, 'w') as f:
        f.write(text)
except (IOError, OSError):
    pass


I'm not 100% happy with that solution, because there is a risk that some 
other process will create a file with the same name in the gap between 
calling os.path.exists and calling open, but I don't know how to solve 
that race condition. What I really want is an option to open() that only 
opens a new file, and fails if the file already exists.

In other contexts, this can actually be a security risk, there is a 
whole class of "time of check to time of use" security bugs:

http://en.wikipedia.org/wiki/Time_of_check_to_time_of_use
http://cwe.mitre.org/data/definitions/367.html



-- 
Steve


More information about the Tutor mailing list