New work-in-progress bisection tool for the Python test suite (in regrtest)

Hi, Last weeks, I worked on a new tool to bisect failing tests because it's painful to bisect manually reference leaks (I remove as much code as possible until the code is small enough to be reviewable manually). See the bisect_test.py script attached to this issue: http://bugs.python.org/issue29512 With the help of Louie Lu, I added new --list-cases option to "python -m test", so you can now list all test cases and write it into a text file: ./python -m test --list-cases test_os > tests I also added a new --matchfile option, to filter tests using a text file which contains one pattern per line: ./python -m test --matchfile=tests test_os fnmatch is used to match test names, so "*" joker character can be used in test names. My bisection tool takes a text file with the --matchfile format (one pattern per line) and creates a random subset of tests with half of the tests. If tests still fail, use the subset. Otherwise, create a new random subset. Loop until the subset contains a single test (configurable threshold, -n command line option). The long term plan is to integrate the bisection feature directly into regrtest. Right now, my script is hardcoded to bisect reference leak bugs, but it should be easy to modify it to bisect other test issues like test creating files without removing it ("ENV_CHANGED" failure in regrtest). For example, a core file is dumped when running test_subprocess on FreeBSD buildbots: http://bugs.python.org/issue30448 But I'm unable to reproduce the issue on my FreeBSD. It would be nice to be able to automate the bisection on the buildbot directly. --list-cases and --matchfile options are now available in 2.7, 3.5, 3.6 and master (3.7) branches. TODO: doctest tests are only partially supported, see: http://bugs.python.org/issue30683 Victor

Great idea! I had thought about doing something similar to this years ago to help diagnose when a test passes in isolation but somehow fails due to one of the bazillion of other tests that ran previously (which can be hard to narrow down if the failure is not early on in a test run). On Fri, 16 Jun 2017 at 09:06 Victor Stinner <victor.stinner@gmail.com> wrote:

2017-06-16 19:00 GMT+02:00 Brett Cannon <brett@python.org>:
Yeah, sometimes a test only fails if it is run after other tests. The well known issue of side effects ;-) Each time that my script bisect_test.py script succeed to reduce the number of tests, it writes the test list on disk. So you can interrupt it the script is stuck and you consider that the output is small enough to a manual review. Victor

Hi, I updated my bisect_test.py script to simplify its command line: https://github.com/haypo/misc/blob/master/python/bisect_test.py To debug a reference leak in test_os, now just type: $ ./python bisect_test.py -R 3:3 test_os Example of output: --- (...) Failing tests (1): * test.test_os.ExecTests.test_execve_invalid_env Bisection completed in 24 iterations and 0:02:11 --- In 2 minutes, you get the guilty: test_execve_invalid_env()! It's now fully automated! I also added --fail-env-changed option to regrtest. Using this option, you can find which test leaked a file, and modified any other resource tracked by regrtest. For examle, I used this command to find which test method of test_subprocess created a core dump file: $ ./python bisect_test.py --fail-env-changed test_subprocess Later, I will try to integrate this tiny tool in Python. Maybe somewhere in Tools/. Or better, maybe directly in regrtest? Victor

2017-06-26 21:58 GMT+02:00 Brett Cannon <brett@python.org>:
I don't see why regrtest isn't the right place for this.
The current regrest CLI isn't designed for subcommands, and I don't want to "pollute" regrtest with multiple options for bisect. Currently, my script has already 4 options: haypo@selma$ python3 ~/prog/GIT/misc/python/bisect_test.py --help usage: bisect_test.py [-h] [-i INPUT] [-o OUTPUT] [-n MAX_TESTS] [-N MAX_ITER] optional arguments: -h, --help show this help message and exit -i INPUT, --input INPUT Test names produced by --list-tests written into a file. If not set, run --list-tests -o OUTPUT, --output OUTPUT Result of the bisection -n MAX_TESTS, --max-tests MAX_TESTS Maximum number of tests to stop the bisection (default: 1) -N MAX_ITER, --max-iter MAX_ITER Maximum number of bisection iterations (default: 100) I really like subcommands, it's a nice way to design complex CLI ;-) Victor

2017-06-27 7:33 GMT+02:00 Serhiy Storchaka <storchaka@gmail.com>:
You could make it just a submodule in the test package.
./python -m test.bisect -R 3:3 test_os
I like the idea :-) I proposed a PR which was approved by Yury Selivanov, and so I just merged it! It means that you can now play with "./python -m test.bisect" in the master branch. Enjoy ;-) Example: ./python -m test.bisect -R 3:3 test_multiprocessing_forkserver This command should give you the name of the two failing test methods which are the last known reference leaks! => http://bugs.python.org/issue30775 This specific bisection is very slow since running the 286 tests of test_multiprocessing_forkserver using -R 3:3 takes longer than 3 minutes on my laptop! And the answer is already know, see the bpo ;-) FYI, apart of the bpo-30775, all other reference leaks now seem to be fixed on 2.7, 3.5, 3.6 and master branches! Tested on Windows and Linux. Victor

On 6/27/2017 8:39 PM, Victor Stinner wrote:
Are you testing for refleaks with gui enabled? A few weeks ago I discovered that some had crept in IDLE, but the gui-less leak tests never saw them. I fixed the ones that existed then, but as I add tests, it will be easy enough to add more leaks. Can test.bisect run a test module in idlelib.idle_test? Unittest.main will, but as far as I know, it does not run leak tests. Test.regrtest has a leak test, but will not modules in other directories, at least not that one. I wrote a custom script to narrow a leak in test.test_idle to one or more files in idlelib.idle_test, but from then on it was manual work, commenting out parts of the file until I identified the test function or functions. One of the things I had to do is test setUpClass and tearDownClass by themselves by commenting out all test_xyz functions and inserting test_dummy(self): pass. -- Terry Jan Reedy

On Tue, Jun 27, 2017 at 11:52 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Are you testing for refleaks with gui enabled?
Yes; the refleak builders are running on my Gentoo and Windows workers, both of which (should have, at least) GUI available and enabled. However, I have caught Xvfb not running properly on the Gentoo bot a time or two, so there may be occasional outages in that coverage. -- Zach

Another example to bisect "Running test_warnings twice fails" bug: http://bugs.python.org/issue30812 --- haypo@selma$ ./python -m test.bisect -o bisect test_warnings test_warnings (...) Tests (1): * test.test_warnings._WarningsTests.test_showwarnmsg_missing (...) Bisection completed in 14 iterations and 0:00:14 --- So the bisection also works to find a failing test method when running a test file twice ;-) Victor 2017-06-28 2:39 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:

On Python 2, the addition of Lib/test/bisect.py caused conflict with Lib/bisect.py when running the Python test suite :-( I chose to rename Lib/test/bisect.py to Lib/test/bisectcmd.py to reduce changes caused by this new debug tool. So only in Python 2.7, you have to run: ./python -m test.bisectcmd ... instead of ./python -m tes.bisect ... See http://bugs.python.org/issue30843 for more information. Victor 2017-06-16 18:05 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:

On 4 July 2017 at 19:55, Victor Stinner <victor.stinner@gmail.com> wrote:
I know it's longer, but perhaps it would make sense to put the bisection helper under "python -m test.support.bisect" in both Python 2 & 3? Even in Python 3 "test.bisect" looks a bit like the test suite for the bisect module to me - you have to "just know" that the latter actually lives at "test.test_bisect". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2017-07-04 12:52 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
For me, test.support is a toolkit to *write* tests, not really to run tests. I don't really care where my bisect tool lives. Serhiy proposed test.bisect, I like because it's short and easy to remind. Technically it is possible to get test.bisect on Python 2, it just requires to modify 4 .py files which import Lib/bisect.py to add "from __future__ import absolute_import": Lib/urllib2.py:import bisect Lib/mhlib.py:from bisect import bisect Lib/test/test_bisect.py:import bisect as py_bisect Lib/multiprocessing/heap.py:import bisect I modified Lib/test/test_bisect.py, but I missed these other ones in my first commit. And then I got a failure in multiprocessing. I chose the conservative approach: rename the new Lib/test/bisect.py file. Do you prefer to get test.bisect, and so modify the 4 files to add "from __future__ import absolute_import"? I didn't recall the subtle details of "relative import" in Python 2. Since I'm now used to Python 3, the Python 2 behaviour now really looks weird to me :-) Victor

On 4 July 2017 at 21:03, Victor Stinner <victor.stinner@gmail.com> wrote:
That doesn't sound right, as implicit relative imports in Python 2 are relative to the *package* (they're akin to writing "from . import name"). That means if test.bisect is shadowing the top level bisect module when backported, it suggests that the test.regrtest directory is ending up on sys.path for the affected test run (e.g. because the tests were run as "python Lib/test/regrtest.py" rather than via the -m switch). Checking test.regrtest in 2.7, it looks to me like it's missing the stanza added to Py3 that makes sure "Lib/test/" isn't present on sys.path - if you add a similar operation to Py2.7, I'd expect the test.bisect name for the command to work there as well. Cheers, Nick. P.S. As far the multiprocessing failure you saw goes, my guess would be that the 2.7 version actually is relying on implicit relative imports to find peer modules, and would need some "from . import ..." adjustments to handle "from __future__ import absolute_import" at the top of the file. However, as noted above, I don't think that's actually what's happening - I think the "Lib/test/" directory is still sometimes ending up on sys.path while running the tests in 2.7. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2017-07-04 13:22 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
I don't think that Lib/test/ is in sys.path. It's more subtle than test. When you run "./python -m test test_bisect", Lib/test/regrtest.py imports "test.test_bisect", and so test_bisect is imported with __package__=['test']. With test_bisect.__package__=['test'], "import bisect" in Lib/test/test_bisect.py imports Lib/test/bisect.py. The question is more when Lib/multiprocessing/heap.py got Lib/test/bisect.py instead of Lib/bisect.py. I didn't dig into this issue. The Python 2 import machinery blows my mind :-) Victor

On 4 July 2017 at 22:10, Victor Stinner <victor.stinner@gmail.com> wrote:
Right, for test_bisect specifically, the implicit relative import problem applies, and "from __future__ import absolute_import" is the relevant fix. That concern just doesn't apply to the *stdlib* modules doing a normal top-level "import bisect".
*This* is the case that I think is due to "Lib/test" being on sys.path when the tests are run: ``` $ ./python -i Lib/test/regrtest.py --help [snip output]
Using test_urllib2 as the example:
$ touch Lib/test/bisect.py $ ./python -m test.regrtest test_urllib2 Run tests sequentially 0:00:00 [1/1] test_urllib2 1 test OK. Total duration: 800 ms Tests result: SUCCESS $ ./python Lib/test/regrtest.py test_urllib2 Run tests sequentially 0:00:00 [1/1] test_urllib2 Warning -- os.environ was modified by test_urllib2 [snip output] test test_urllib2 failed -- multiple errors occurred; run in verbose mode for details 1 test failed: test_urllib2 Total duration: 107 ms Tests result: FAILURE ``` Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2017-07-04 16:27 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
That concern just doesn't apply to the *stdlib* modules doing a normal top-level "import bisect".
Hum, ok. I created a PR which removes '' and Lib/test/ from sys.path, and rename again test.bisectcmd to test.bisect. Would you mind to review it? https://github.com/python/cpython/pull/2567 Victor

Nick Coghlan <ncoghlan@gmail.com> writes:
Or test.tool.bisect, similar to json.tool. ciao, lele. -- nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia. lele@metapensiero.it | -- Fortunato Depero, 1929.

Great idea! I had thought about doing something similar to this years ago to help diagnose when a test passes in isolation but somehow fails due to one of the bazillion of other tests that ran previously (which can be hard to narrow down if the failure is not early on in a test run). On Fri, 16 Jun 2017 at 09:06 Victor Stinner <victor.stinner@gmail.com> wrote:

2017-06-16 19:00 GMT+02:00 Brett Cannon <brett@python.org>:
Yeah, sometimes a test only fails if it is run after other tests. The well known issue of side effects ;-) Each time that my script bisect_test.py script succeed to reduce the number of tests, it writes the test list on disk. So you can interrupt it the script is stuck and you consider that the output is small enough to a manual review. Victor

Hi, I updated my bisect_test.py script to simplify its command line: https://github.com/haypo/misc/blob/master/python/bisect_test.py To debug a reference leak in test_os, now just type: $ ./python bisect_test.py -R 3:3 test_os Example of output: --- (...) Failing tests (1): * test.test_os.ExecTests.test_execve_invalid_env Bisection completed in 24 iterations and 0:02:11 --- In 2 minutes, you get the guilty: test_execve_invalid_env()! It's now fully automated! I also added --fail-env-changed option to regrtest. Using this option, you can find which test leaked a file, and modified any other resource tracked by regrtest. For examle, I used this command to find which test method of test_subprocess created a core dump file: $ ./python bisect_test.py --fail-env-changed test_subprocess Later, I will try to integrate this tiny tool in Python. Maybe somewhere in Tools/. Or better, maybe directly in regrtest? Victor

2017-06-26 21:58 GMT+02:00 Brett Cannon <brett@python.org>:
I don't see why regrtest isn't the right place for this.
The current regrest CLI isn't designed for subcommands, and I don't want to "pollute" regrtest with multiple options for bisect. Currently, my script has already 4 options: haypo@selma$ python3 ~/prog/GIT/misc/python/bisect_test.py --help usage: bisect_test.py [-h] [-i INPUT] [-o OUTPUT] [-n MAX_TESTS] [-N MAX_ITER] optional arguments: -h, --help show this help message and exit -i INPUT, --input INPUT Test names produced by --list-tests written into a file. If not set, run --list-tests -o OUTPUT, --output OUTPUT Result of the bisection -n MAX_TESTS, --max-tests MAX_TESTS Maximum number of tests to stop the bisection (default: 1) -N MAX_ITER, --max-iter MAX_ITER Maximum number of bisection iterations (default: 100) I really like subcommands, it's a nice way to design complex CLI ;-) Victor

2017-06-27 7:33 GMT+02:00 Serhiy Storchaka <storchaka@gmail.com>:
You could make it just a submodule in the test package.
./python -m test.bisect -R 3:3 test_os
I like the idea :-) I proposed a PR which was approved by Yury Selivanov, and so I just merged it! It means that you can now play with "./python -m test.bisect" in the master branch. Enjoy ;-) Example: ./python -m test.bisect -R 3:3 test_multiprocessing_forkserver This command should give you the name of the two failing test methods which are the last known reference leaks! => http://bugs.python.org/issue30775 This specific bisection is very slow since running the 286 tests of test_multiprocessing_forkserver using -R 3:3 takes longer than 3 minutes on my laptop! And the answer is already know, see the bpo ;-) FYI, apart of the bpo-30775, all other reference leaks now seem to be fixed on 2.7, 3.5, 3.6 and master branches! Tested on Windows and Linux. Victor

On 6/27/2017 8:39 PM, Victor Stinner wrote:
Are you testing for refleaks with gui enabled? A few weeks ago I discovered that some had crept in IDLE, but the gui-less leak tests never saw them. I fixed the ones that existed then, but as I add tests, it will be easy enough to add more leaks. Can test.bisect run a test module in idlelib.idle_test? Unittest.main will, but as far as I know, it does not run leak tests. Test.regrtest has a leak test, but will not modules in other directories, at least not that one. I wrote a custom script to narrow a leak in test.test_idle to one or more files in idlelib.idle_test, but from then on it was manual work, commenting out parts of the file until I identified the test function or functions. One of the things I had to do is test setUpClass and tearDownClass by themselves by commenting out all test_xyz functions and inserting test_dummy(self): pass. -- Terry Jan Reedy

On Tue, Jun 27, 2017 at 11:52 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Are you testing for refleaks with gui enabled?
Yes; the refleak builders are running on my Gentoo and Windows workers, both of which (should have, at least) GUI available and enabled. However, I have caught Xvfb not running properly on the Gentoo bot a time or two, so there may be occasional outages in that coverage. -- Zach

Another example to bisect "Running test_warnings twice fails" bug: http://bugs.python.org/issue30812 --- haypo@selma$ ./python -m test.bisect -o bisect test_warnings test_warnings (...) Tests (1): * test.test_warnings._WarningsTests.test_showwarnmsg_missing (...) Bisection completed in 14 iterations and 0:00:14 --- So the bisection also works to find a failing test method when running a test file twice ;-) Victor 2017-06-28 2:39 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:

On Python 2, the addition of Lib/test/bisect.py caused conflict with Lib/bisect.py when running the Python test suite :-( I chose to rename Lib/test/bisect.py to Lib/test/bisectcmd.py to reduce changes caused by this new debug tool. So only in Python 2.7, you have to run: ./python -m test.bisectcmd ... instead of ./python -m tes.bisect ... See http://bugs.python.org/issue30843 for more information. Victor 2017-06-16 18:05 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:

On 4 July 2017 at 19:55, Victor Stinner <victor.stinner@gmail.com> wrote:
I know it's longer, but perhaps it would make sense to put the bisection helper under "python -m test.support.bisect" in both Python 2 & 3? Even in Python 3 "test.bisect" looks a bit like the test suite for the bisect module to me - you have to "just know" that the latter actually lives at "test.test_bisect". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2017-07-04 12:52 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
For me, test.support is a toolkit to *write* tests, not really to run tests. I don't really care where my bisect tool lives. Serhiy proposed test.bisect, I like because it's short and easy to remind. Technically it is possible to get test.bisect on Python 2, it just requires to modify 4 .py files which import Lib/bisect.py to add "from __future__ import absolute_import": Lib/urllib2.py:import bisect Lib/mhlib.py:from bisect import bisect Lib/test/test_bisect.py:import bisect as py_bisect Lib/multiprocessing/heap.py:import bisect I modified Lib/test/test_bisect.py, but I missed these other ones in my first commit. And then I got a failure in multiprocessing. I chose the conservative approach: rename the new Lib/test/bisect.py file. Do you prefer to get test.bisect, and so modify the 4 files to add "from __future__ import absolute_import"? I didn't recall the subtle details of "relative import" in Python 2. Since I'm now used to Python 3, the Python 2 behaviour now really looks weird to me :-) Victor

On 4 July 2017 at 21:03, Victor Stinner <victor.stinner@gmail.com> wrote:
That doesn't sound right, as implicit relative imports in Python 2 are relative to the *package* (they're akin to writing "from . import name"). That means if test.bisect is shadowing the top level bisect module when backported, it suggests that the test.regrtest directory is ending up on sys.path for the affected test run (e.g. because the tests were run as "python Lib/test/regrtest.py" rather than via the -m switch). Checking test.regrtest in 2.7, it looks to me like it's missing the stanza added to Py3 that makes sure "Lib/test/" isn't present on sys.path - if you add a similar operation to Py2.7, I'd expect the test.bisect name for the command to work there as well. Cheers, Nick. P.S. As far the multiprocessing failure you saw goes, my guess would be that the 2.7 version actually is relying on implicit relative imports to find peer modules, and would need some "from . import ..." adjustments to handle "from __future__ import absolute_import" at the top of the file. However, as noted above, I don't think that's actually what's happening - I think the "Lib/test/" directory is still sometimes ending up on sys.path while running the tests in 2.7. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2017-07-04 13:22 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
I don't think that Lib/test/ is in sys.path. It's more subtle than test. When you run "./python -m test test_bisect", Lib/test/regrtest.py imports "test.test_bisect", and so test_bisect is imported with __package__=['test']. With test_bisect.__package__=['test'], "import bisect" in Lib/test/test_bisect.py imports Lib/test/bisect.py. The question is more when Lib/multiprocessing/heap.py got Lib/test/bisect.py instead of Lib/bisect.py. I didn't dig into this issue. The Python 2 import machinery blows my mind :-) Victor

On 4 July 2017 at 22:10, Victor Stinner <victor.stinner@gmail.com> wrote:
Right, for test_bisect specifically, the implicit relative import problem applies, and "from __future__ import absolute_import" is the relevant fix. That concern just doesn't apply to the *stdlib* modules doing a normal top-level "import bisect".
*This* is the case that I think is due to "Lib/test" being on sys.path when the tests are run: ``` $ ./python -i Lib/test/regrtest.py --help [snip output]
Using test_urllib2 as the example:
$ touch Lib/test/bisect.py $ ./python -m test.regrtest test_urllib2 Run tests sequentially 0:00:00 [1/1] test_urllib2 1 test OK. Total duration: 800 ms Tests result: SUCCESS $ ./python Lib/test/regrtest.py test_urllib2 Run tests sequentially 0:00:00 [1/1] test_urllib2 Warning -- os.environ was modified by test_urllib2 [snip output] test test_urllib2 failed -- multiple errors occurred; run in verbose mode for details 1 test failed: test_urllib2 Total duration: 107 ms Tests result: FAILURE ``` Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2017-07-04 16:27 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
That concern just doesn't apply to the *stdlib* modules doing a normal top-level "import bisect".
Hum, ok. I created a PR which removes '' and Lib/test/ from sys.path, and rename again test.bisectcmd to test.bisect. Would you mind to review it? https://github.com/python/cpython/pull/2567 Victor

Nick Coghlan <ncoghlan@gmail.com> writes:
Or test.tool.bisect, similar to json.tool. ciao, lele. -- nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia. lele@metapensiero.it | -- Fortunato Depero, 1929.
participants (8)
-
Brett Cannon
-
Eric V. Smith
-
Lele Gaifax
-
Nick Coghlan
-
Serhiy Storchaka
-
Terry Reedy
-
Victor Stinner
-
Zachary Ware