Mailman 3 Tests failing on Windows with TESTFN - Python-Dev

Tests failing on Windows with TESTFN

older
Error message for wrong number of...

Tim Golden

July 25, 2018

6:07 p.m.

I'm just easing back into core development work by trying to get a stable testing environment for Python development on Windows. One problem is that certain tests use support.TESTFN (a local directory constructed from the pid) for output files etc. However this can cause issues on Windows when recreating the folders / files for multiple tests, especially when running in parallel. Here's an example on my laptop deliberately running 3 tests with -j0 which I know will generate an error about one time in three: C:\work-in-progress\cpython>python -mtest -j0 test_urllib2 test_bz2 test_importlib Running Debug|Win32 interpreter... Run tests in parallel using 6 child processes 0:00:23 [1/3/1] test_urllib2 failed test test_urllib2 failed -- Traceback (most recent call last): File "C:\work-in-progress\cpython\lib\test\test_urllib2.py", line 821, in test_file f = open(TESTFN, "wb") PermissionError: [Errno 13] Permission denied: '@test_15564_tmp' Although these errors are both intermittent and fairly easily spotted, the effect is that I rarely get a clean test run when I'm applying a patch. I started to address this years ago but things stalled. I'm happy to pick this up again and have another go, but I wanted to ask first whether there was any objection to my converting tests to using tempfile functions which should avoid the problem? TJG

Show replies by date

Brett Cannon

July 2018

8:11 p.m.

On Wed, 25 Jul 2018 at 15:16 Tim Golden <mail@timgolden.me.uk> wrote:

...

I'm just easing back into core development work by trying to get a stable testing environment for Python development on Windows.

One problem is that certain tests use support.TESTFN (a local directory constructed from the pid) for output files etc. However this can cause issues on Windows when recreating the folders / files for multiple tests, especially when running in parallel.

Here's an example on my laptop deliberately running 3 tests with -j0 which I know will generate an error about one time in three:

C:\work-in-progress\cpython>python -mtest -j0 test_urllib2 test_bz2 test_importlib

Running Debug|Win32 interpreter... Run tests in parallel using 6 child processes 0:00:23 [1/3/1] test_urllib2 failed test test_urllib2 failed -- Traceback (most recent call last): File "C:\work-in-progress\cpython\lib\test\test_urllib2.py", line 821, in test_file f = open(TESTFN, "wb") PermissionError: [Errno 13] Permission denied: '@test_15564_tmp'

Although these errors are both intermittent and fairly easily spotted, the effect is that I rarely get a clean test run when I'm applying a patch.

I started to address this years ago but things stalled. I'm happy to pick this up again and have another go, but I wanted to ask first whether there was any objection to my converting tests to using tempfile functions which should avoid the problem?

I personally don't see any reason to block a move away from TESTFN since it obviously has some inherent issues in parallel test running which means flaky tests.

Serhiy Storchaka

8:05 a.m.

25.07.18 18:07, Tim Golden пише:

...

I'm just easing back into core development work by trying to get a stable testing environment for Python development on Windows.

One problem is that certain tests use support.TESTFN (a local directory constructed from the pid) for output files etc. However this can cause issues on Windows when recreating the folders / files for multiple tests, especially when running in parallel.

Here's an example on my laptop deliberately running 3 tests with -j0 which I know will generate an error about one time in three:

C:\work-in-progress\cpython>python -mtest -j0 test_urllib2 test_bz2 test_importlib

Running Debug|Win32 interpreter... Run tests in parallel using 6 child processes 0:00:23 [1/3/1] test_urllib2 failed test test_urllib2 failed -- Traceback (most recent call last): File "C:\work-in-progress\cpython\lib\test\test_urllib2.py", line 821, in test_file f = open(TESTFN, "wb") PermissionError: [Errno 13] Permission denied: '@test_15564_tmp'

Although these errors are both intermittent and fairly easily spotted, the effect is that I rarely get a clean test run when I'm applying a patch.

Try to use test.support.unlink() instead of os.unlink(). It is purposed for solving this issue.

...

I started to address this years ago but things stalled. I'm happy to pick this up again and have another go, but I wanted to ask first whether there was any objection to my converting tests to using tempfile functions which should avoid the problem?

I think the only concern is that it is a pretty large change across many files, and there is a risk of introducing bugs. Also, creating a temporary file in setUp() may slowdown testing, especially if TESTFN is not needed in every test.

Chris Jerdonek

8:25 a.m.

On Wed, Jul 25, 2018 at 8:07 AM, Tim Golden <mail@timgolden.me.uk> wrote:

...

One problem is that certain tests use support.TESTFN (a local directory constructed from the pid) for output files etc. However this can cause issues on Windows when recreating the folders / files for multiple tests, especially when running in parallel.

Here's an example on my laptop deliberately running 3 tests with -j0 which I know will generate an error about one time in three:

C:\work-in-progress\cpython>python -mtest -j0 test_urllib2 test_bz2 test_importlib

Running Debug|Win32 interpreter... Run tests in parallel using 6 child processes 0:00:23 [1/3/1] test_urllib2 failed test test_urllib2 failed -- Traceback (most recent call last): File "C:\work-in-progress\cpython\lib\test\test_urllib2.py", line 821, in test_file f = open(TESTFN, "wb") PermissionError: [Errno 13] Permission denied: '@test_15564_tmp'

Although these errors are both intermittent and fairly easily spotted, the effect is that I rarely get a clean test run when I'm applying a patch.

I started to address this years ago but things stalled. I'm happy to pick this up again and have another go, but I wanted to ask first whether there was any objection to my converting tests to using tempfile functions which should avoid the problem?

Do you know what's causing the issue on Windows? I thought TESTFN was designed to work for parallel testing, so it would surprise me if there was a problem with it. Alternatively, if TESTFN should be okay, I wonder if it's an issue with another test or tests not cleaning up after itself correctly, in which case it seems like this is an opportunity to track down and fix that issue. Switching to something else would just serve to hide / mask the issue with those other tests. --Chris

Giampaolo Rodola'

4:41 p.m.

On Thu, Jul 26, 2018 at 12:16 AM Tim Golden <mail@timgolden.me.uk> wrote:

...

I'm just easing back into core development work by trying to get a stable testing environment for Python development on Windows.

One problem is that certain tests use support.TESTFN (a local directory constructed from the pid) for output files etc. However this can cause issues on Windows when recreating the folders / files for multiple tests, especially when running in parallel.

Here's an example on my laptop deliberately running 3 tests with -j0 which I know will generate an error about one time in three:

C:\work-in-progress\cpython>python -mtest -j0 test_urllib2 test_bz2 test_importlib

Running Debug|Win32 interpreter... Run tests in parallel using 6 child processes 0:00:23 [1/3/1] test_urllib2 failed test test_urllib2 failed -- Traceback (most recent call last): File "C:\work-in-progress\cpython\lib\test\test_urllib2.py", line 821, in test_file f = open(TESTFN, "wb") PermissionError: [Errno 13] Permission denied: '@test_15564_tmp'

Although these errors are both intermittent and fairly easily spotted, the effect is that I rarely get a clean test run when I'm applying a patch.

I started to address this years ago but things stalled. I'm happy to pick this up again and have another go, but I wanted to ask first whether there was any objection to my converting tests to using tempfile functions which should avoid the problem?

TJG

...

From my experience open() raising PermissionDenied on Windows often means "file is already in use" as I think is this case. The same issue exists for directories: https://bugs.python.org/issue33240.

Being TESTFN a global name it appears not suited for parallel testing. Windows makes this more noticeable than POSIX as it's more strict, but accessing the same file from 2 unit tests is technically a problem regardless of the platform. I don't think that means we should ditch TESTFN in favor of tempfile.mktemp() though. Instead the file cleanup functions (support.unlink() and support.rmtree()) may be more clever and (important) they should always be used in setUp / tearDown. For instance, the traceback you pasted refers to a test class which doesn't do this. In psutil I've had occasional Windows failures like this for years then I got tired of it and came up with this: https://github.com/giampaolo/psutil/blob/1b09b5fff78f705dfb42458726ff9789c26... ...which basically aggressively retries os.unlink or shutil.rmtree for 1 sec in case of (any) error, and I haven't had this problem since then. I suppose test.support's unlink() and rmtree() can do something similar, maybe just by using a better exception handling, and all unit tests should use them in setUp / tearDown. I think this will diminish the occasional failures on Windows, although not completely. -- Giampaolo - http://grodola.blogspot.com

Chris Jerdonek

5:48 p.m.

On Fri, Jul 27, 2018 at 6:41 AM, Giampaolo Rodola' <g.rodola@gmail.com> wrote:

...

Being TESTFN a global name it appears not suited for parallel testing.

It was designed for parallel testing though: # Disambiguate TESTFN for parallel testing, while letting it remain a valid # module name. TESTFN = "{}_{}_tmp".format(TESTFN, os.getpid()) https://github.com/python/cpython/blob/aee632dfbb0abbc0d2bcc988c43a736afd568...

...

Windows makes this more noticeable than POSIX as it's more strict, but accessing the same file from 2 unit tests is technically a problem regardless of the platform. I don't think that means we should ditch TESTFN in favor of tempfile.mktemp() though. Instead the file cleanup functions (support.unlink() and support.rmtree()) may be more clever and (important) they should always be used in setUp / tearDown. For instance, the traceback you pasted refers to a test class which doesn't do this.

The "test_file" test method referenced in the traceback calls os.remove(TESTFN) in finally blocks preceding its calls to open(TESTFN, "wb"), and inspecting the method shows that it must have been able to open TESTFN earlier in the method (the same test method uses TESTFN multiple times): https://github.com/python/cpython/blob/aee632dfbb0abbc0d2bcc988c43a736afd568... So I think one should investigate what can be causing the error / how it can be happening. TESTFN uses the pid of the process, so it doesn't seem like another test case could be interfering and opening the same TESTFN while the "test_file" test method is running. On Stack Overflow, there are some comments suggesting that in some cases os.remove() doesn't always complete right away (e.g. because of anti-malware software briefly holding a second reference). --Chris

...

In psutil I've had occasional Windows failures like this for years then I got tired of it and came up with this: https://github.com/giampaolo/psutil/blob/1b09b5fff78f705dfb42458726ff9789c26... ...which basically aggressively retries os.unlink or shutil.rmtree for 1 sec in case of (any) error, and I haven't had this problem since then.

I suppose test.support's unlink() and rmtree() can do something similar, maybe just by using a better exception handling, and all unit tests should use them in setUp / tearDown. I think this will diminish the occasional failures on Windows, although not completely.

-- Giampaolo - http://grodola.blogspot.com _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.jerdonek%40gmail.co...

Giampaolo Rodola'

5:58 p.m.

On Fri, Jul 27, 2018 at 4:48 PM Chris Jerdonek <chris.jerdonek@gmail.com> wrote:

...

On Fri, Jul 27, 2018 at 6:41 AM, Giampaolo Rodola' <g.rodola@gmail.com> wrote:

...
Being TESTFN a global name it appears not suited for parallel testing.

It was designed for parallel testing though:

# Disambiguate TESTFN for parallel testing, while letting it remain a valid # module name. TESTFN = "{}_{}_tmp".format(TESTFN, os.getpid())

https://github.com/python/cpython/blob/aee632dfbb0abbc0d2bcc988c43a736afd568...

Oh, nice, I didn't notice that, sorry.

Tim Golden

5:38 p.m.

On 25/07/2018 16:07, Tim Golden wrote:

...

One problem is that certain tests use support.TESTFN (a local directory constructed from the pid) for output files etc. However this can cause issues on Windows when recreating the folders / files for multiple tests, especially when running in parallel.

Here's an example on my laptop deliberately running 3 tests with -j0 which I know will generate an error about one time in three:

C:\work-in-progress\cpython>python -mtest -j0 test_urllib2 test_bz2 test_importlib

[... snip ...]

...

[...] I wanted to ask first whether there was any objection to my converting tests to using tempfile functions which should avoid the problem?

Thanks to those who responded here and/or on the two bpo issues I've already raised: https://bugs.python.org/issue34239 -- test_bz2 https://bugs.python.org/issue34240 -- test_mmap Two key questions have been posed in different ways: 1) Why are these errors occurring? ie are we dodging a root cause issue 2) Isn't this what test.support.unlink is there to solve? For (1) I'm putting together a test run using code which I originally wrote for https://bugs.python.org/issue7443 to force the issues out into the open. For (2), yes: test.support.unlink is supposed to solve that. But it's either not doing enough retries etc. or it's missing a trick. To be clear: my motivation here isn't some housekeeping or modernisation exercise. I want to get a clear test run on a fresh build on Windows. At present I never get that. I'd be interested to hear from other Windows devs whether they have the same experience? TJG

Jeremy Kloth

7:27 p.m.

On Sat, Jul 28, 2018 at 8:41 AM Tim Golden <mail@timgolden.me.uk> wrote:

...

1) Why are these errors occurring? ie are we dodging a root cause issue

The root cause is how Windows handles file deletions. When a file is removed, it is not immediately removed from the directory, instead, it is simply marked for deletion. Only once all open handles to the file are closed, is it removed from the directory. The most common cause of additional open handles to a file is a antivirus scanner.

...

2) Isn't this what test.support.unlink is there to solve?

Yes. If test.support.unlink doesn't succeed in removing the file, a RuntimeWarning will be emitted.

...

For (1) I'm putting together a test run using code which I originally wrote for https://bugs.python.org/issue7443 to force the issues out into the open.

These intermittent errors are tough as it is really just a matter of timing. The faster the disk and the more loaded the CPU, the more often these can occur, however.

...

For (2), yes: test.support.unlink is supposed to solve that. But it's either not doing enough retries etc. or it's missing a trick.

If you are not seeing the RuntimeWarnings, then something else is amiss. -- Jeremy Kloth

Tim Golden

8:20 p.m.

On 28/07/2018 17:27, Jeremy Kloth wrote:

...

On Sat, Jul 28, 2018 at 8:41 AM Tim Golden <mail@timgolden.me.uk> wrote:

...
1) Why are these errors occurring? ie are we dodging a root cause issue

The root cause is how Windows handles file deletions. When a file is removed, it is not immediately removed from the directory, instead, it is simply marked for deletion. Only once all open handles to the file are closed, is it removed from the directory. The most common cause of additional open handles to a file is a antivirus scanner.

Thanks, Jeremy. I'm already aware of that. (If you look back at https://bugs.python.org/issue7443 you'll find me explaining the same to MvL some years ago). [In case the tone isn't clear, there's absolutely no sarcasm implied here. I just wanted to make it clear that I'm at least partly au fait with the situation :)]. Although things have moved on since that discussion and test.support.unlink has grown some extra legs, all it's done really is to push the bump along the carpet for a bit. I've got a newly-installed Win10 machine with the typical MS Antivirus & TortoiseGitCache vying for locks with every other process. I've just done a full test run: python -mtest -j0 -v > test.log and I've got a mixture of Permission (winerror 13) & Access errors (winerror 5). The former are usually attempting to open a TESTFN file; the latter are attempting to unlink one. Most of the Error 5 are os.unlink, but at least one is from test.support.unlink. I understand the unable-to-recreate (virus checkers with share-delete handles) but I'm not so clear on what's blocking the unlink. IOW I think we have more than one issue. Here's the thing. The TESTFN approach creates a directory per process test_python_<pid> and some variant of @test_<pid>_tmp inside that directory. But the same filename in the same directory is used for every test in that process. Now, leaving aside the particular mechanism by which Windows processes might be holding locks which prevent removal or re-addition, that already seems like an odd choice. I think that was my starting point: rather than develop increasingly involved and still somewhat brittle mechanisms, why not do what you'd naturally do with a new test and use tempfile? I was expecting someone to come forward to highlight some particular reason why the TESTFN approach is superior, but apart from a reference to the possibly cost of creating a new temporary file per test, no-one really has.

...

If you are not seeing the RuntimeWarnings, then something else is amiss.

I think I've seen one RuntimeWarning in the many, many times I've been running through tests today. TJG

Jeremy Kloth

12:17 a.m.

On Sat, Jul 28, 2018 at 11:20 AM Tim Golden <mail@timgolden.me.uk> wrote:

...

Although things have moved on since that discussion and test.support.unlink has grown some extra legs, all it's done really is to push the bump along the carpet for a bit. I've got a newly-installed Win10 machine with the typical MS Antivirus & TortoiseGitCache vying for locks with every other process. I've just done a full test run:

python -mtest -j0 -v > test.log

I, for one, would like to see that log. The issues you are have are fairly unique. Just check out the buildbot status page. I know that some of the workers are somewhat limited, but my worker (https://buildbot.python.org/all/#/workers/12) is running on dedicated hardware. Before https://bugs.python.org/issue15496 was applied, the errors you describe were indeed happening, but no longer.

...

Here's the thing. The TESTFN approach creates a directory per process test_python_<pid> and some variant of @test_<pid>_tmp inside that directory. But the same filename in the same directory is used for every test in that process. Now, leaving aside the particular mechanism by which Windows processes might be holding locks which prevent removal or re-addition, that already seems like an odd choice.

Well, since every test (well, test file) is run in its own process, a directory per process doesn't seem that odd. You seem to be focusing on 2 tests in particular, both of which have not been converted to use test.support.unlink for removing. The "trick" with test.support.unlink now is that it waits until the file has truly removed from the directory before continuing. Using this, in turn, would mean that following create *should* succeed without error. If test.support.unlink returns without warning, the removed file is really gone.

...

I think that was my starting point: rather than develop increasingly involved and still somewhat brittle mechanisms, why not do what you'd naturally do with a new test and use tempfile? I was expecting someone to come forward to highlight some particular reason why the TESTFN approach is superior, but apart from a reference to the possibly cost of creating a new temporary file per test, no-one really has.

*PLEASE*, don't use tempfile to create files/directories in tests. It is unfriendly to (Windows) buildbots. The current approach of directory-per-process ensures no test turds are left behind, whereas the tempfile solution slowly fills up my buildbot. Windows doesn't natively clean out the temp directory. Additionally, I'm working on a solution to allow the test machinery to use a given directory for test outputs to allow for a RAM-backed filesystem for temporary files. With this, I've had the full test suite (with -uall including bigmem) take ~5min. Having some tests use the Windows default test directory would break this. It is simply not feasible to have the entire Windows TEMP in RAM, but just Python's test suite usage is small enough (<6Gb). Another point, some tests require that the temporary filename resides on the same drive as the source directory (discovered in developing the above). This means that, Windows development would be restricted to the same drive as the user directory, if the per-directory approach isn't used. -- Jeremy Kloth

eryk sun

1:12 a.m.

On Sat, Jul 28, 2018 at 9:17 PM, Jeremy Kloth <jeremy.kloth@gmail.com> wrote:

...

*PLEASE*, don't use tempfile to create files/directories in tests. It is unfriendly to (Windows) buildbots. The current approach of directory-per-process ensures no test turds are left behind, whereas the tempfile solution slowly fills up my buildbot. Windows doesn't natively clean out the temp directory.

FYI, Windows 10 storage sense (under system->storage) can be configured to delete temporary files on a schedule. Of course that doesn't help with older systems.

Brett Cannon

3:40 a.m.

On Sat, Jul 28, 2018, 15:13 eryk sun, <eryksun@gmail.com> wrote:

...

On Sat, Jul 28, 2018 at 9:17 PM, Jeremy Kloth <jeremy.kloth@gmail.com> wrote:

...
*PLEASE*, don't use tempfile to create files/directories in tests. It is unfriendly to (Windows) buildbots. The current approach of directory-per-process ensures no test turds are left behind, whereas the tempfile solution slowly fills up my buildbot. Windows doesn't natively clean out the temp directory.

FYI, Windows 10 storage sense (under system->storage) can be configured to delete temporary files on a schedule. Of course that doesn't help with older systems.

If Windows doesn't clean up its temp directory on a regular basis then that doesn't suggest to me not to use tempfile, but instead that the use of tempfile still needs to clean up after itself. And if there is a lacking feature in tempfile then we should add it instead of a avoiding the module. -Brett _______________________________________________

...

Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org

Jeremy Kloth

4:04 a.m.

On Sat, Jul 28, 2018 at 6:43 PM Brett Cannon <brett@python.org> wrote:

...

If Windows doesn't clean up its temp directory on a regular basis then that doesn't suggest to me not to use tempfile, but instead that the use of tempfile still needs to clean up after itself. And if there is a lacking feature in tempfile then we should add it instead of a avoiding the module.

Mind you, this is mentioned in the confines of the test harness where just about anything can happen (and usually does!). Something that cannot be coded against using just tempfile is cleanup on process abort. The per-process-directory approach handles this case. I would think it is desired to have no leftovers after running the test harness (especially in regards to the buildbots). Now, I'm not sure the exact cause of all of the leftovers in the TEMP directory, but it is definitely something that is currently happening (and shouldn't be). It is not exactly the easiest of tasks to track the file usage of every test in the test suite. It is certainly easier to replace usages of os.unlink with test.support.unlink within the test suite. -- Jeremy Kloth

Tim Golden

11:58 a.m.

On 29/07/2018 02:04, Jeremy Kloth wrote:

...

On Sat, Jul 28, 2018 at 6:43 PM Brett Cannon <brett@python.org> wrote:

...
If Windows doesn't clean up its temp directory on a regular basis then that doesn't suggest to me not to use tempfile, but instead that the use of tempfile still needs to clean up after itself. And if there is a lacking feature in tempfile then we should add it instead of a avoiding the module.

Mind you, this is mentioned in the confines of the test harness where just about anything can happen (and usually does!). Something that cannot be coded against using just tempfile is cleanup on process abort. The per-process-directory approach handles this case.

I would think it is desired to have no leftovers after running the test harness (especially in regards to the buildbots).

Now, I'm not sure the exact cause of all of the leftovers in the TEMP directory, but it is definitely something that is currently happening (and shouldn't be). It is not exactly the easiest of tasks to track the file usage of every test in the test suite. It is certainly easier to replace usages of os.unlink with test.support.unlink within the test suite.

In the interests of trying to keep a focus to the changes I'm making, I propose to start again by, as you suggest, making use of test.support.unlink where it's not currently used. From the evidence I don't believe that will solve every problem I'm seeing but it should certainly reduce them. I do there there's mileage in a wider change to revamp the test suite's naming and cleanup of temporary files but I'm very wary of trying to undertake what would undoubtedly be a sprawling and probably contentious change. TJG

Steve Dower

3:35 p.m.

On 29Jul2018 0958, Tim Golden wrote:

...

In the interests of trying to keep a focus to the changes I'm making, I propose to start again by, as you suggest, making use of test.support.unlink where it's not currently used. From the evidence I don't believe that will solve every problem I'm seeing but it should certainly reduce them.

One additional thing that may help (if support.unlink doesn't already do it) is to rename the file before deleting it. Renames are always possible even with open handles, and then you can create a new file at the original name. Cheers, Steve

eryk sun

5:22 p.m.

On Sun, Jul 29, 2018 at 12:35 PM, Steve Dower <steve.dower@python.org> wrote:

...

One additional thing that may help (if support.unlink doesn't already do it) is to rename the file before deleting it. Renames are always possible even with open handles, and then you can create a new file at the original name.

Steve Dower

8:02 p.m.

If the problem is AV scanners, then they should be opening them properly for this (and since the delete is not failing but is being deferred, I assume it's allowing deletes). If the problem is elsewhere in our code base then we have a different bug. Top-posted from my Windows 10 phone From: eryk sun Sent: Sunday, 29 July 2018 15:28 To: python-dev@python.org Subject: Re: [Python-Dev] Tests failing on Windows with TESTFN On Sun, Jul 29, 2018 at 12:35 PM, Steve Dower <steve.dower@python.org> wrote:

...

One additional thing that may help (if support.unlink doesn't already do it) is to rename the file before deleting it. Renames are always possible even with open handles, and then you can create a new file at the original name.

Renaming open files typically fails with a sharing violation (32). Most programs open files with read and write sharing but not delete sharing. This applies to Python, except temporary files (i.e. os.O_TEMPORARY) do share delete access. Renaming a file is effectively adding a new link and deleting the old link, so it requires opening the file with delete access. Also, renaming a directory that has open files in the tree fails with access denied (5). _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org

Chris Jerdonek

4:12 a.m.

On Sat, Jul 28, 2018 at 5:40 PM Brett Cannon <brett@python.org> wrote:

...

On Sat, Jul 28, 2018, 15:13 eryk sun, <eryksun@gmail.com> wrote:

...
On Sat, Jul 28, 2018 at 9:17 PM, Jeremy Kloth <jeremy.kloth@gmail.com> wrote:

...
*PLEASE*, don't use tempfile to create files/directories in tests. It is unfriendly to (Windows) buildbots. The current approach of directory-per-process ensures no test turds are left behind, whereas the tempfile solution slowly fills up my buildbot. Windows doesn't natively clean out the temp directory.

FYI, Windows 10 storage sense (under system->storage) can be configured to delete temporary files on a schedule. Of course that doesn't help with older systems.

If Windows doesn't clean up its temp directory on a regular basis then that doesn't suggest to me not to use tempfile, but instead that the use of tempfile still needs to clean up after itself. And if there is a lacking feature in tempfile then we should add it instead of a avoiding the module.

Regardless of whether the tempfile or TESTFN approach is used, I think it would be best for a few reasons if the choice is abstracted behind a uniquely named test function (e.g. make_test_file if not already used). —Chris

...

-Brett

_______________________________________________

...
Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe:

...
https://mail.python.org/mailman/options/python-dev/brett%40python.org

_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.jerdonek%40gmail.co...

Jeremy Kloth

4:57 a.m.

On Sat, Jul 28, 2018 at 7:15 PM Chris Jerdonek <chris.jerdonek@gmail.com> wrote:

...

Regardless of whether the tempfile or TESTFN approach is used, I think it would be best for a few reasons if the choice is abstracted behind a uniquely named test function (e.g. make_test_file if not already used).

+1, although my particular choice of color would be to add a pair of functions, mkstemp and mkdtemp, to match the style of test.support-wrapped library functions for use in the test harness. -- Jeremy Kloth

Tim Golden

12:13 p.m.

On 28/07/2018 22:17, Jeremy Kloth wrote:

...

On Sat, Jul 28, 2018 at 11:20 AM Tim Golden <mail@timgolden.me.uk> wrote:

...
Although things have moved on since that discussion and test.support.unlink has grown some extra legs, all it's done really is to push the bump along the carpet for a bit. I've got a newly-installed Win10 machine with the typical MS Antivirus & TortoiseGitCache vying for locks with every other process. I've just done a full test run:

python -mtest -j0 -v > test.log

I, for one, would like to see that log. The issues you are have are fairly unique. Just check out the buildbot status page. I know that some of the workers are somewhat limited, but my worker (https://buildbot.python.org/all/#/workers/12) is running on dedicated hardware. Before https://bugs.python.org/issue15496 was applied, the errors you describe were indeed happening, but no longer.

For an example: http://tjg.org.uk/test.log Thinkpad T420, 4Gb, i5, SSD Recently rebuilt and reinstalled: Win10, VS2017, TortoiseGit, standard Windows Antimalware, usual developer tools. That particular run was done with the laptop unattended (ie nothing else going on at the front end). But the problem is certainly not specific to this laptop. TJG

eryk sun

5:05 p.m.

On Sun, Jul 29, 2018 at 9:13 AM, Tim Golden <mail@timgolden.me.uk> wrote:

...

For an example:

http://tjg.org.uk/test.log

Thinkpad T420, 4Gb, i5, SSD

Recently rebuilt and reinstalled: Win10, VS2017, TortoiseGit, standard Windows Antimalware, usual developer tools. That particular run was done with the laptop unattended (ie nothing else going on at the front end). But the problem is certainly not specific to this laptop.

On my last run I had one test directory that wasn't removed properly, but nothing like the flood of EACCES and ERROR_ACCES_DENIED errors you have in that log. Then again, I had Defender disabled by policy. I'll enable it and add exceptions for my source and build directories, and see how it goes. It would be nice if OSError instances always captured the last Windows error and NT status values when instantiated. We have no guarantees that these values are valid, but in many contexts they are. In the case of a test log, it would certainly help to clarify errors without having to individually investigate each one. For example, trying to open a directory as a file is a common error, but all Python tells us on Windows is that it failed with EACCES. In this case the last Windows error is ERROR_ACCESS_DENIED, which doesn't help, but the last NT status code is STATUS_FILE_IS_A_DIRECTORY (0xc00000ba). Here's a file opener that adds last_winerror and last_ntstatus values. import os ntdll = ctypes.WinDLL('ntdll') kernel32 = ctypes.WinDLL('kernel32') def nt_opener(path, flags): try: return os.open(path, flags) except OSError as e: last_ntstatus = ntdll.RtlGetLastNtStatus() last_winerror = kernel32.GetLastError() e.last_ntstatus = last_ntstatus & 2**32 - 1 e.last_winerror = (last_winerror if e.winerror is None else e.winerror) if e.errno is not None or e.winerror is not None: # hack the last error/status into the error message e.strerror = '[Last NtStatus {:#08x}] {}'.format( e.last_ntstatus, e.strerror or '') if e.winerror is None: e.strerror = '[Last WinError {}] {}'.format( e.last_winerror, e.strerror or '') raise e from None Opening a directory as a file: >>> open('C:/Windows', opener=nt_opener) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 17, in nt_opener File "<stdin>", line 3, in nt_opener PermissionError: [Errno 13] [Last WinError 5] [Last NtStatus 0xc00000ba] Permission denied: 'C:/Windows'

Jeremy Kloth

5:21 p.m.

On Sun, Jul 29, 2018 at 3:13 AM Tim Golden <mail@timgolden.me.uk> wrote:

...

For an example:

http://tjg.org.uk/test.log

Thank you! After inspecting all the errors, it does seem that they are ALL caused by "bare" os.unlink/rmdir calls. So it seems that a massive undertaking of ferreting out these locations and replacing them with their support equivalents would be needed to have a passing test suite for you. Unfortunately, test_mailbox is failing due to the same fate, but within the stdlib itself. The fix for that will probably need to be a rename/delete dance. diff --git a/Lib/mailbox.py b/Lib/mailbox.py index 056251d..23662f2 100644 --- a/Lib/mailbox.py +++ b/Lib/mailbox.py @@ -701,8 +701,10 @@ class _singlefileMailbox(Mailbox): try: os.rename(new_file.name, self._path) except FileExistsError: - os.remove(self._path) + temp_name = _create_temporary_name(self._path) + os.rename(self._path, temp_name) os.rename(new_file.name, self._path) + os.remove(temp_name) self._file = open(self._path, 'rb+') self._toc = new_toc self._pending = False @@ -2112,11 +2114,14 @@ def _create_carefully(path): finally: os.close(fd) +def _create_temporary_name(path): + """Create a temp filename based on path.""" + return '%s.%s.%s.%s' % (path, int(time.time()), socket.gethostname(), + os.getpid()) + def _create_temporary(path): """Create a temp file based on path and open for reading and writing.""" - return _create_carefully('%s.%s.%s.%s' % (path, int(time.time()), - socket.gethostname(), - os.getpid())) + return _create_carefully(_create_temporary_name(path)) def _sync_flush(f): """Ensure changes to file f are physically on disk.""" -- Jeremy Kloth

eryk sun

5:54 p.m.

On Sun, Jul 29, 2018 at 2:21 PM, Jeremy Kloth <jeremy.kloth@gmail.com> wrote:

...

try: os.rename(new_file.name, self._path) except FileExistsError: - os.remove(self._path) + temp_name = _create_temporary_name(self._path) + os.rename(self._path, temp_name) os.rename(new_file.name, self._path) + os.remove(temp_name)

This should call os.replace to allow the file system to replace the existing file.

Tim Golden

6:31 p.m.

On 29/07/2018 15:21, Jeremy Kloth wrote:

...

On Sun, Jul 29, 2018 at 3:13 AM Tim Golden <mail@timgolden.me.uk> wrote:

...
For an example:

http://tjg.org.uk/test.log

Thank you! After inspecting all the errors, it does seem that they are ALL caused by "bare" os.unlink/rmdir calls. So it seems that a massive undertaking of ferreting out these locations and replacing them with their support equivalents would be needed to have a passing test suite for you.

Thanks for checking. As I mentioned elsewhere in this thread, I propose to go through and apply support.unlink where necessary. It won't make things any worse and will hopefully improve in some areas. For test_mailbox I've experimentally implemented a hybrid tempfile / local directory solution. ie I've created a new file on each run, but only within the python_<pid> folder which already exists. As long as the directory cleans up there should be no leftovers. That's certainly helped although my re-run harness has provoked at least one error. TJG

Jeremy Kloth

6:40 p.m.

On Sun, Jul 29, 2018 at 9:34 AM Tim Golden <mail@timgolden.me.uk> wrote:

...

For test_mailbox I've experimentally implemented a hybrid tempfile / local directory solution. ie I've created a new file on each run, but only within the python_<pid> folder which already exists. As long as the directory cleans up there should be no leftovers. That's certainly helped although my re-run harness has provoked at least one error.

As Eryk noted, the fix for mailbox.py (yes, the stdlib needs fixing in this case) is quite simple: diff --git a/Lib/mailbox.py b/Lib/mailbox.py index 056251d..eb85df1 100644 --- a/Lib/mailbox.py +++ b/Lib/mailbox.py @@ -701,8 +701,7 @@ class _singlefileMailbox(Mailbox): try: os.rename(new_file.name, self._path) except FileExistsError: - os.remove(self._path) - os.rename(new_file.name, self._path) + os.replace(new_file.name, self._path) self._file = open(self._path, 'rb+') self._toc = new_toc self._pending = False -- Jeremy Kloth

Chris Jerdonek

12:34 a.m.

On Sat, Jul 28, 2018 at 10:20 AM, Tim Golden <mail@timgolden.me.uk> wrote:

...

Here's the thing. The TESTFN approach creates a directory per process test_python_<pid> and some variant of @test_<pid>_tmp inside that directory.

I filed an issue some years back about this (still open): https://bugs.python.org/issue15305 The pid is unnecessarily being used twice. Once the directory is created for the process, there shouldn't be a need to disambiguate within the directory further.

...

But the same filename in the same directory is used for every test in that process. Now, leaving aside the particular mechanism by which Windows processes might be holding locks which prevent removal or re-addition, that already seems like an odd choice.

I think that was my starting point: rather than develop increasingly involved and still somewhat brittle mechanisms, why not do what you'd naturally do with a new test and use tempfile? I was expecting someone to come forward to highlight some particular reason why the TESTFN approach is superior, but apart from a reference to the possibly cost of creating a new temporary file per test, no-one really has.

I think there is value in having the files used during test runs in a known location. How about a middle ground where, once the process-specific directory is created in a known location, unique files are created within that directory (e.g. using a random suffix, or a perhaps a suffix that is a function of the test name / test id)? This would address both the issue you're experiencing, and, if the directory is deleted at the end of the test run, it would address the issue Jeremy mentions of growing leftover files. There could also be a check at the end of each test run making sure that the directory is empty (to check for tests that aren't cleaning up after themselves). --Chris

eryk sun

1:02 a.m.

On Sat, Jul 28, 2018 at 5:20 PM, Tim Golden <mail@timgolden.me.uk> wrote:

...

I've got a mixture of Permission (winerror 13) & Access errors (winerror 5)

EACCES (13) is a CRT errno value. Python raises PermissionError for EACCES and EPERM (1, not used). It also does the reverse mapping for WinAPI calls, so PermissionError is raised either way. About 25 WinAPI error codes map to EACCES. Commonly it's due to either ERROR_ACCESS_DENIED (5) or ERROR_SHARING_VIOLATION (32). open() uses read-write sharing but not delete sharing. In this case trying to either delete an already open file or open a file that's already open with delete access (e.g. an O_TEMPORARY open) both fail with a sharing violation. An access-denied error could be due to a range of causes. Over 20 NTAPI status codes map to ERROR_ACCESS_DENIED. Commonly for a file it's due to one of the following status codes: STATUS_ACCESS_DENIED (0xc0000022) The file security doesn't grant the requested access to the caller. STATUS_DELETE_PENDING (0xc0000056) The file's delete disposition is set, i.e. it's flagged to be deleted when the last handle is closed. Opening a new handle is disallowed for any access. STATUS_FILE_IS_A_DIRECTORY (0xc00000ba) Except when using backup semantics, CreateFile calls NtCreateFile with the flag FILE_NON_DIRECTORY_FILE, so only non-directory files/devices can be opened. STATUS_CANNOT_DELETE (0xc0000121) The file is either readonly or memory mapped.

Nick Coghlan

6:41 p.m.

On 29 July 2018 at 03:20, Tim Golden <mail@timgolden.me.uk> wrote:

...

I think that was my starting point: rather than develop increasingly involved and still somewhat brittle mechanisms, why not do what you'd naturally do with a new test and use tempfile? I was expecting someone to come forward to highlight some particular reason why the TESTFN approach is superior, but apart from a reference to the possibly cost of creating a new temporary file per test, no-one really has.

For higher level modules, "just use tempfile to create a new temporary directory, then unlink it at the end" is typically going to be a good answer (modulo the current cleanup issues that Jeremy is reporting, but ideally those will be fixed rather than avoided, either by improving the way the module is being used, or fixing any underlying defects). For lower level modules though, adding a test suite dependency on tempfile introduces a non-trivial chance of adding an operational dependency from a module's test suite back to the module itself. When that happens, module regressions may show up as secondary failures in tempfile that then need to be debugged, rather than as specific unit test failures that point you towards exactly what you broke. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Tim Golden

6:46 p.m.

On 30/07/2018 16:41, Nick Coghlan wrote:

...

On 29 July 2018 at 03:20, Tim Golden <mail@timgolden.me.uk> wrote:

...
I think that was my starting point: rather than develop increasingly involved and still somewhat brittle mechanisms, why not do what you'd naturally do with a new test and use tempfile? I was expecting someone to come forward to highlight some particular reason why the TESTFN approach is superior, but apart from a reference to the possibly cost of creating a new temporary file per test, no-one really has.

For higher level modules, "just use tempfile to create a new temporary directory, then unlink it at the end" is typically going to be a good answer (modulo the current cleanup issues that Jeremy is reporting, but ideally those will be fixed rather than avoided, either by improving the way the module is being used, or fixing any underlying defects).

For lower level modules though, adding a test suite dependency on tempfile introduces a non-trivial chance of adding an operational dependency from a module's test suite back to the module itself. When that happens, module regressions may show up as secondary failures in tempfile that then need to be debugged, rather than as specific unit test failures that point you towards exactly what you broke.

Cheers, Nick.

Thanks Nick; I hadn't thought about the possible interdependency issue. I think for the moment my approach will be to switch to support.unlink wherever possible to start with. Before introducing other (eg tempfile) changes, this should at least narrow the issues down. I've made a start on that (before inadvertently blowing away all the changes since my hours-previous commit!) If further changes are necessary then I'll probably look case-by-case to see whether a tempfile or some other solution would help. That said, that's potentially quite a lot of change -- at least in terms of files changed if not strictly of functionality. So I'm thinking of trickle-feeding the changes through as people will understandably baulk at a patchbomb (PR-bomb?) hitting the codebase all at once. TJG

Chris Jerdonek

10:45 p.m.

On Mon, Jul 30, 2018 at 8:46 AM, Tim Golden <mail@timgolden.me.uk> wrote:

...

On 30/07/2018 16:41, Nick Coghlan wrote:

...
On 29 July 2018 at 03:20, Tim Golden <mail@timgolden.me.uk> wrote:

...
I think that was my starting point: rather than develop increasingly involved and still somewhat brittle mechanisms, why not do what you'd naturally do with a new test and use tempfile? I was expecting someone to come forward to highlight some particular reason why the TESTFN approach is superior, but apart from a reference to the possibly cost of creating a new temporary file per test, no-one really has.

For higher level modules, "just use tempfile to create a new temporary directory, then unlink it at the end" is typically going to be a good answer (modulo the current cleanup issues that Jeremy is reporting, but ideally those will be fixed rather than avoided, either by improving the way the module is being used, or fixing any underlying defects).

If there's a desire to use tempfile, another option is to have tempfile create the temp files inside the temporary directory the test harness creates specifically for testing -- using the "dir" argument to many of tempfile's functions. Here is where the process-specific temp directory is created for testing (inside test.libregrtest.main): https://github.com/python/cpython/blob/9045199c5aaeac9b52537581be127d999b594... This would also facilitate the clean-up of any leftover temp files. Again, I think it would be best to use any tempfile functions behind one or more test-support functions, so the choice of location, etc. can be changed centrally without needing to modify code everywhere. --Chris

...

...
For lower level modules though, adding a test suite dependency on tempfile introduces a non-trivial chance of adding an operational dependency from a module's test suite back to the module itself. When that happens, module regressions may show up as secondary failures in tempfile that then need to be debugged, rather than as specific unit test failures that point you towards exactly what you broke.

Cheers, Nick.

Thanks Nick; I hadn't thought about the possible interdependency issue.

I think for the moment my approach will be to switch to support.unlink wherever possible to start with. Before introducing other (eg tempfile) changes, this should at least narrow the issues down. I've made a start on that (before inadvertently blowing away all the changes since my hours-previous commit!)

If further changes are necessary then I'll probably look case-by-case to see whether a tempfile or some other solution would help.

That said, that's potentially quite a lot of change -- at least in terms of files changed if not strictly of functionality. So I'm thinking of trickle-feeding the changes through as people will understandably baulk at a patchbomb (PR-bomb?) hitting the codebase all at once.

TJG

_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.jerdonek%40gmail.co...

2411

Age (days ago)

2416

Last active (days ago)

List overview

Download

30 comments

9 participants

participants (9)

Brett Cannon
Chris Jerdonek
eryk sun
Giampaolo Rodola'
Jeremy Kloth
Nick Coghlan
Serhiy Storchaka
Steve Dower
Tim Golden

Tests failing on Windows with TESTFN

tags

participants (9)