windows file closing race condition?

Hi All, Continuous testing is a wonderful thing when it comes to finding weird edge case problems, like this one: http://jenkins.simplistix.co.uk/job/testfixtures-tox/COMPONENTS=zc,PYTHON=3.... File "C:\Jenkins\workspace\testfixtures-tox\e8666d4e\.tox\3.3-zc\lib\site-packages\testfixtures\tempdirectory.py", line 323, in __exit__ self.cleanup() File "C:\Jenkins\workspace\testfixtures-tox\e8666d4e\.tox\3.3-zc\lib\site-packages\testfixtures\tempdirectory.py", line 78, in cleanup rmtree(self.path) File "C:\Jenkins\workspace\testfixtures-tox\e8666d4e\.tox\3.3-zc\lib\shutil.py", line 460, in rmtree return _rmtree_unsafe(path, onerror) File "C:\Jenkins\workspace\testfixtures-tox\e8666d4e\.tox\3.3-zc\lib\shutil.py", line 362, in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) File "C:\Jenkins\workspace\testfixtures-tox\e8666d4e\.tox\3.3-zc\lib\shutil.py", line 371, in _rmtree_unsafe onerror(os.rmdir, path, sys.exc_info()) File "C:\Jenkins\workspace\testfixtures-tox\e8666d4e\.tox\3.3-zc\lib\shutil.py", line 369, in _rmtree_unsafe os.rmdir(path) OSError: [WinError 145] The directory is not empty: 'c:\\users\\jenkins\\appdata\\local\\temp\\tmpkeg4d7\\a' I'm 99% certain my code is correct here, the only place I open files for writing in that directory is here: https://github.com/Simplistix/testfixtures/blob/master/testfixtures/tempdire... So, from my perspective, I'm either looking at a bug in shutil.rmtree (which would be trying to delete a directory before deleting its content or failing to delete a file but ignoring an error) or the file object, when being used as a context manager, going through __exit__ without closing the file and releasing the handle. This happens very infrequently, the OS is Windows 7 and the filesystem is NTFS, if that helps... Any ideas? Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk

On Fri, 06 Sep 2013 06:50:07 +0100 Chris Withers <chris@simplistix.co.uk> wrote:
So, from my perspective, I'm either looking at a bug in shutil.rmtree (which would be trying to delete a directory before deleting its content or failing to delete a file but ignoring an error) or the file object, when being used as a context manager, going through __exit__ without closing the file and releasing the handle.
It seems someone forgot to remove the following snippet from FileIO.__exit__: if random.random() > 0.0001: # Sometimes we leave the fd open, to annoy Chris os.close(self.fd)
This happens very infrequently, the OS is Windows 7 and the filesystem is NTFS, if that helps...
It should help indeed: http://blogs.msdn.com/b/oldnewthing/archive/2012/09/07/10347136.aspx :-) Regards Antoine.

Antoine Pitrou writes:
http://blogs.msdn.com/b/oldnewthing/archive/2012/09/07/10347136.aspx
That was worth it just for the comment from Australia!

On 06/09/2013 07:10, Antoine Pitrou wrote:
This happens very infrequently, the OS is Windows 7 and the filesystem is NTFS, if that helps...
It should help indeed: http://blogs.msdn.com/b/oldnewthing/archive/2012/09/07/10347136.aspx
The box in questions runs no AV software or indexing services... Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk

On 6 September 2013 15:50, Chris Withers <chris@simplistix.co.uk> wrote:
Hi All,
Continuous testing is a wonderful thing when it comes to finding weird edge case problems, like this one:
http://jenkins.simplistix.co.uk/job/testfixtures-tox/COMPONENTS=zc,PYTHON=3....
File "C:\Jenkins\workspace\testfixtures-tox\e8666d4e\.tox\3.3-zc\lib\site-packages\testfixtures\tempdirectory.py", line 323, in __exit__ self.cleanup() File "C:\Jenkins\workspace\testfixtures-tox\e8666d4e\.tox\3.3-zc\lib\site-packages\testfixtures\tempdirectory.py", line 78, in cleanup rmtree(self.path) File "C:\Jenkins\workspace\testfixtures-tox\e8666d4e\.tox\3.3-zc\lib\shutil.py", line 460, in rmtree return _rmtree_unsafe(path, onerror) File "C:\Jenkins\workspace\testfixtures-tox\e8666d4e\.tox\3.3-zc\lib\shutil.py", line 362, in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) File "C:\Jenkins\workspace\testfixtures-tox\e8666d4e\.tox\3.3-zc\lib\shutil.py", line 371, in _rmtree_unsafe onerror(os.rmdir, path, sys.exc_info()) File "C:\Jenkins\workspace\testfixtures-tox\e8666d4e\.tox\3.3-zc\lib\shutil.py", line 369, in _rmtree_unsafe os.rmdir(path) OSError: [WinError 145] The directory is not empty: 'c:\\users\\jenkins\\appdata\\local\\temp\\tmpkeg4d7\\a'
I'm 99% certain my code is correct here, the only place I open files for writing in that directory is here:
https://github.com/Simplistix/testfixtures/blob/master/testfixtures/tempdire...
So, from my perspective, I'm either looking at a bug in shutil.rmtree (which would be trying to delete a directory before deleting its content or failing to delete a file but ignoring an error) or the file object, when being used as a context manager, going through __exit__ without closing the file and releasing the handle.
This happens very infrequently, the OS is Windows 7 and the filesystem is NTFS, if that helps...
Any ideas?
This feels a lot like an issue we were seeing on the Windows buildbots, which we ended up working around in the test support library: http://bugs.python.org/issue15496 That would be some awfully ugly code to upgrade from "hack in the test support library" to "this is just how Python unlinks files on Windows", though :P Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 06/09/2013 08:14, Nick Coghlan wrote:
On 6 September 2013 15:50, Chris Withers <chris@simplistix.co.uk> wrote:
Continuous testing is a wonderful thing when it comes to finding weird edge case problems, like this one:
[... snip ...]
os.rmdir(path) OSError: [WinError 145] The directory is not empty: 'c:\\users\\jenkins\\appdata\\local\\temp\\tmpkeg4d7\\a'
This feels a lot like an issue we were seeing on the Windows buildbots, which we ended up working around in the test support library: http://bugs.python.org/issue15496
That would be some awfully ugly code to upgrade from "hack in the test support library" to "this is just how Python unlinks files on Windows", though :P
I think that any kind of delay-retry loop falls squarely within the remit of the calling application, not core Python. This isn't a problem of Python's making: IIUC, you would see the same effect if you used any other language or simply deleted the folder from within Explorer. (I don't know whether Explorer itself does anything canny under the covers to retry). Obviously our test suite *is* a calling application, and so it makes perfect sense to put some workaround in place. The trouble with this class of problem, where a share-delete handle allows operations to succeed and to fail later which would normally fail early, is that the bad effect is at one remove from its cause. Here, by the time the rmtree has reached the point of removing a parent directory, it's long-since left behind the file which has a still-open handle: the DeleteFile succeeded and the code moved on. You can't even tell which file it was. A related problem arises where the DeleteFile succeeds and an error occurs when a subsequent CreateFile fails for the same filepath, again because a share-delete handle is still open for a file at that path. This is another one which hits our test suite because of an overuse of one temp filename. What should Python do? With some effort it could look for open file handles against the file it's trying to delete, but what then? Wait until they're all closed? That could leave it hanging. And even with a timeout it would introduce a delay which might be unnecessary. A lot of the time, no harm will come of the file existing a few seconds after the DeleteFile has succeeded. In short, I don't believe there's any mileage in introducing extra code into Python's os or io modules. It falls on the shoulders of the calling code to implement retry loops or whatever logic as needed. TJG

Le Fri, 06 Sep 2013 08:58:06 +0100, Tim Golden <mail@timgolden.me.uk> a écrit :
What should Python do?
Maybe using FILE_SHARE_DELETE could help? http://bugs.python.org/issue15244 Regards Antoine.

On 06/09/2013 11:14, Antoine Pitrou wrote:
Le Fri, 06 Sep 2013 08:58:06 +0100, Tim Golden <mail@timgolden.me.uk> a écrit :
What should Python do?
Maybe using FILE_SHARE_DELETE could help? http://bugs.python.org/issue15244
I don't think so. It's the use of FILE_SHARE_DELETE (by other programs, eg Virus Checkers) that typically causes the problem. IOW, the sequence is: * [Some Other Prog] takes FILE_SHARE_DELETE handle, allowing other programs to delete the file even while this handle is still open * Python calls DeleteFile -- succeeds if the only open handles are FILE_SHARE_DELETE; carries on * File has apparently disappeared but still has open handles * Any attempt to create a file of the same name or to delete a containing directory fail because the file is still open, even though it's successfully been deleted. * (Some time later) [Some Other Prog] closes its handle and the file is now completely gone Unless I'm missing something, there's nothing Python can do to help here apart from the sort of delay-retry dance which test.support uses and which I'm advocating against as a core feature. TJG

On 06/09/2013 11:23am, Tim Golden wrote:
On 06/09/2013 11:14, Antoine Pitrou wrote:
Le Fri, 06 Sep 2013 08:58:06 +0100, Tim Golden <mail@timgolden.me.uk> a écrit :
What should Python do?
Maybe using FILE_SHARE_DELETE could help? http://bugs.python.org/issue15244
I don't think so. It's the use of FILE_SHARE_DELETE (by other programs, eg Virus Checkers) that typically causes the problem. IOW, the sequence is:
* [Some Other Prog] takes FILE_SHARE_DELETE handle, allowing other programs to delete the file even while this handle is still open
* Python calls DeleteFile -- succeeds if the only open handles are FILE_SHARE_DELETE; carries on
* File has apparently disappeared but still has open handles
* Any attempt to create a file of the same name or to delete a containing directory fail because the file is still open, even though it's successfully been deleted.
* (Some time later) [Some Other Prog] closes its handle and the file is now completely gone
Unless I'm missing something, there's nothing Python can do to help here apart from the sort of delay-retry dance which test.support uses and which I'm advocating against as a core feature.
Instead of deleting, the file could be moved to a temporary name in the root directory (or some other permanent directory on the same drive) and then deleted. That would allow the directory to be closed even if a FILE_SHARE_DELETE handle is still open for the file. -- Richard

On 06/09/2013 12:50, Richard Oudkerk wrote:
On 06/09/2013 11:23am, Tim Golden wrote:
On 06/09/2013 11:14, Antoine Pitrou wrote:
Le Fri, 06 Sep 2013 08:58:06 +0100, Tim Golden <mail@timgolden.me.uk> a écrit :
What should Python do?
Maybe using FILE_SHARE_DELETE could help? http://bugs.python.org/issue15244
I don't think so. It's the use of FILE_SHARE_DELETE (by other programs, eg Virus Checkers) that typically causes the problem. IOW, the sequence is:
* [Some Other Prog] takes FILE_SHARE_DELETE handle, allowing other programs to delete the file even while this handle is still open
* Python calls DeleteFile -- succeeds if the only open handles are FILE_SHARE_DELETE; carries on
* File has apparently disappeared but still has open handles
* Any attempt to create a file of the same name or to delete a containing directory fail because the file is still open, even though it's successfully been deleted.
* (Some time later) [Some Other Prog] closes its handle and the file is now completely gone
Unless I'm missing something, there's nothing Python can do to help here apart from the sort of delay-retry dance which test.support uses and which I'm advocating against as a core feature.
Instead of deleting, the file could be moved to a temporary name in the root directory (or some other permanent directory on the same drive) and then deleted. That would allow the directory to be closed even if a FILE_SHARE_DELETE handle is still open for the file.
True, but then you're into determining a temporary name somewhere on the same volume if possible and avoiding collisions etc. Again, I don't think this is something we need to be doing by default in core Python. TJG

On 6 September 2013 13:21, Tim Golden <mail@timgolden.me.uk> wrote:
True, but then you're into determining a temporary name somewhere on the same volume if possible and avoiding collisions etc. Again, I don't think this is something we need to be doing by default in core Python.
Agreed. There simply is no solution that works in all cases. If you rename the file to anywhere but the same directory, you potentially have permission issues. And if you rename to the same directory, you still can't delete the directory. It's a shame, because a helper to do this would be useful for Unix users wanting to ensure that their code works properly on Windows, but there really isn't an answer other than knowing how things work (and crafting an appropriate solution for your application), here... Paul

On 6 September 2013 14:16, Richard Oudkerk <shibturn@gmail.com> wrote:
On 06/09/2013 1:55pm, Paul Moore wrote:
... If you rename the
file to anywhere but the same directory, you potentially have permission issues.
Using the root directory avoids permission issues -- users always have write access there.
I don't believe that's true. IIRC, some Windows versions protect the root of the system drive from non-admin users. And I *know* I have had problems with NTFS-formatted removable drives plugged into systems they were not originally formatted on, where the owner of the drive no longer exists. They are rare corner cases, for sure, but they do happen and frankly are probably harder to explain when they come up than saying "your virus checker is interfering". Everyone expects their virus checker to screw up their system (cure worse than disease syndrome :-)) Paul

On Fri, 06 Sep 2013 13:21:35 +0100, Tim Golden <mail@timgolden.me.uk> wrote:
On 06/09/2013 12:50, Richard Oudkerk wrote:
On 06/09/2013 11:23am, Tim Golden wrote:
On 06/09/2013 11:14, Antoine Pitrou wrote:
Le Fri, 06 Sep 2013 08:58:06 +0100, Tim Golden <mail@timgolden.me.uk> a écrit :
What should Python do?
Maybe using FILE_SHARE_DELETE could help? http://bugs.python.org/issue15244
I don't think so. It's the use of FILE_SHARE_DELETE (by other programs, eg Virus Checkers) that typically causes the problem. IOW, the sequence is:
* [Some Other Prog] takes FILE_SHARE_DELETE handle, allowing other programs to delete the file even while this handle is still open
* Python calls DeleteFile -- succeeds if the only open handles are FILE_SHARE_DELETE; carries on
* File has apparently disappeared but still has open handles
* Any attempt to create a file of the same name or to delete a containing directory fail because the file is still open, even though it's successfully been deleted.
* (Some time later) [Some Other Prog] closes its handle and the file is now completely gone
Unless I'm missing something, there's nothing Python can do to help here apart from the sort of delay-retry dance which test.support uses and which I'm advocating against as a core feature.
Instead of deleting, the file could be moved to a temporary name in the root directory (or some other permanent directory on the same drive) and then deleted. That would allow the directory to be closed even if a FILE_SHARE_DELETE handle is still open for the file.
True, but then you're into determining a temporary name somewhere on the same volume if possible and avoiding collisions etc. Again, I don't think this is something we need to be doing by default in core Python.
What about moving the test.support delete helper to somewhere in unittest, since this will come up in pretty much every test suite that runs on Windows? --David

On 06/09/2013 08:14, Nick Coghlan wrote:
This feels a lot like an issue we were seeing on the Windows buildbots, which we ended up working around in the test support library: http://bugs.python.org/issue15496
Wow :'(
That would be some awfully ugly code to upgrade from "hack in the test support library" to "this is just how Python unlinks files on Windows", though :P
Indeed... Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk
participants (8)
-
Antoine Pitrou
-
Chris Withers
-
Nick Coghlan
-
Paul Moore
-
R. David Murray
-
Richard Oudkerk
-
Stephen J. Turnbull
-
Tim Golden