correct way to catch exception with Python 'with' statement

Steve D'Aprano steve+python at pearwood.info
Thu Dec 1 23:47:45 EST 2016


On Fri, 2 Dec 2016 11:26 am, DFS wrote:

> On 12/01/2016 06:48 PM, Ned Batchelder wrote:
>> On Thursday, December 1, 2016 at 2:31:11 PM UTC-5, DFS wrote:
>>> After a simple test below, I submit that the above scenario would never
>>> occur.  Ever.  The time gap between checking for the file's existence
>>> and then trying to open it is far too short for another process to sneak
>>> in and delete the file.
>>
>> It doesn't matter how quickly the first operation is (usually) followed
>> by the second.  Your process could be swapped out between the two
>> operations. On a heavily loaded machine, there could be a very long
>> time between them
> 
> 
> How is it possible that the 'if' portion runs, then 44/100,000ths of a
> second later my process yields to another process which deletes the
> file, then my process continues.
> 
> Is that governed by the dreaded GIL?

No, that has nothing to do with the GIL. It is because the operating 
system is a preemptive multi-processing operating system. All modern OSes 
are: Linux, OS X, Windows.

Each program that runs, including the OS itself, is one or more processes.
Typically, even on a single-user desktop machine, you will have dozens of
processes running simultaneously.

Every so-many clock ticks, the OS pauses whatever process is running, 
more-or-less interrupting whatever it was doing, passes control on to 
another process, then the next, then the next, and so on. The application 
doesn't have any control over this, it can be paused at any time, 
normally just for a small fraction of a second, but potentially for 
seconds or minutes at a time if the system is heavily loaded.



> "The mechanism used by the CPython interpreter to assure that only one
> thread executes Python bytecode at a time."
> 
> But I see you posted a stack-overflow answer:
> 
> "In the case of CPython's GIL, the granularity is a bytecode
> instruction, so execution can switch between threads at any bytecode."
> 
> Does that mean "chars=f.read().lower()" could get interrupted between
> the read() and the lower()?

Yes, but don't think about Python threads. Think about the OS.

I'm not an expert on the low-level hardware details, so I welcome
correction, but I think that you can probably expect that the OS can
interrupt code execution between any two CPU instructions. Something like
str.lower() is likely to be thousands of CPU instructions, even for a small
string.


[...]
> With a 5ms window, it seems the following code would always protect the
> file from being deleted between lines 4 and 5.
> 
> --------------------------------
> 1 import os,threading
> 2 f_lock=threading.Lock()
> 3 with f_lock:
> 4   if os.path.isfile(filename):
> 5     with open(filename,'w') as f:
> 6       process(f)
> --------------------------------
> 
> 
> 
>> even if on an average machine, they are executed very quickly.

Absolutely not. At least on Linux, locks are advisory, not mandatory. Here
are a pair of scripts that demonstrate that. First, the well-behaved script
that takes out a lock:

# --- locker.py ---
import os, threading, time

filename = 'thefile.txt'
f_lock = threading.Lock()

with f_lock:
    print '\ntaking lock'
    if os.path.isfile(filename):
        print filename, 'exists and is a file'
        time.sleep(10)
        print 'lock still active'
        with open(filename,'w') as f:
            print f.read()

# --- end ---


Now, a second script which naively, or maliciously, just deletes the file:

# --- bandit.py ---
import os, time
filename = 'thefile.txt'
time.sleep(1)
print 'deleting file, mwahahahaha!!!'
os.remove(filename)
print 'deleted'

# --- end ---



Now, I run them both simultaneously:

[steve at ando thread-lock]$ touch thefile.txt # ensure file exists
[steve at ando thread-lock]$ (python locker.py &) ; (python bandit.py &)
[steve at ando thread-lock]$ 
taking lock
thefile.txt exists and is a file
deleting file, mwahahahaha!!!
deleted
lock still active
Traceback (most recent call last):
  File "locker.py", line 14, in <module>
    print f.read()
IOError: File not open for reading



This is on Linux. Its possible that Windows behaves differently, and I don't
know how to run a command in the background in command.com or cmd.exe or
whatever you use on Windows.


[...]
> Also, this is just theoretical (I hope).  It would be terrible system
> design if all those dozens of processes were reading and writing and
> deleting the same file.

It is not theoretical. And it's not a terrible system design, in the sense
that the alternatives are *worse*.

* Turn the clock back to the 1970s and 80s with single-processing 
  operating systems? Unacceptable -- even primitive OSes like DOS 
  and Mac System 5 needed to include some basic multiprocessing 
  capability.

- And what are servers supposed to do in this single-process world?

- Enforce mandatory locks? A great way for malware or hostile users
  to perform Denial Of Service attacks.

Even locks being left around accidentally can be a real pain: Windows users
can probably tell you about times that a file has been accidentally left
open by buggy applications, and there's nothing you can do to unlock it
short of rebooting. Unacceptable for a server, and pain in the rear even for
a desktop.

- Make every file access go through a single scheduling application
  which ensures there are no clashes? Probably very hard to write,
  and would probably kill performance. Imagine you cannot even check
  the existence of a 4GB file until its finished copying onto a USB
  stick... 



The cost of allowing two programs to run at the same time is that 
sometimes they will both want to do something to the same file.

Fundamentally though, the solution here is quite simple: don't rely on 
"Look Before You Leap" checks any time you have shared data, and the 
file system is shared data. If you want *reliable* code, you MUST use a 
try...except block to recover from file system errors.




-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, 
and sure enough, things got worse.



More information about the Python-list mailing list