py.test bisection algorithm
Hi, we are using py.test in SymPy and unfortunately, we have some bugs in SymPy, that occurs when certain tests are run and don't occur when the tests are run in different order. It's very time consuming to determine which tests cause the problem (I did that several times by hand). It occured to me that it should be possible to enhance py.test by this facility to do it automatically. Example: 1) This passes: $ py.test sympy/series/tests/test_series.py -k issue409 This doesn't 2) $ py.test One problem is, that 2) runs for several minutes, another problem is that I need to specify all tests on the command line and then deleting some of them it see if it still fails, until I narrow the issue down, usually quite nicely. Then I need to play with the "-k" parameter to try different test cases in the file, until I determine the minimal set of tests that, when executed in order, produce the error. This can by done automatically - the py.test will be given a set of files (or just tests) that pass and another set that fails and it will narrow the problem down, by bisecting. I would like to implement this in py.test. So I'll try my best and send you a patch. If you have some ideas, that could help me, I am interested. Thanks a lot, Ondrej
Ondrej Certik wrote:
Hi,
we are using py.test in SymPy and unfortunately, we have some bugs in SymPy, that occurs when certain tests are run and don't occur when the tests are run in different order. It's very time consuming to determine which tests cause the problem (I did that several times by hand). It occured to me that it should be possible to enhance py.test by this facility to do it automatically.
Example:
1) This passes:
$ py.test sympy/series/tests/test_series.py -k issue409
This doesn't
2) $ py.test
One problem is, that 2) runs for several minutes, another problem is that I need to specify all tests on the command line and then deleting some of them it see if it still fails, until I narrow the issue down, usually quite nicely. Then I need to play with the "-k" parameter to try different test cases in the file, until I determine the minimal set of tests that, when executed in order, produce the error.
This can by done automatically - the py.test will be given a set of files (or just tests) that pass and another set that fails and it will narrow the problem down, by bisecting.
I would like to implement this in py.test. So I'll try my best and send you a patch. If you have some ideas, that could help me, I am interested.
Thanks a lot, Ondrej Hi Ondrej.
Sorry for late reply. This mail should go to py-dev instead or at least also. This would cause faster reply (still my fault, I was offline for a while :) Anyway - very nice idea. The long-term goal for py.test is to support this algorithm ie. in respect to revisions (give me revision that broke test X) etc. So if you go for implementing this, would be very cool for many things. Lately I didn't invest too much time in py.test developement, but I would be really happy to review your patches and/or help you. You can find help most of the time on #pylib somehow (#pypy might work as well in case noone hangs around #pylib). Cheers, fijal :.
Ondrej Certik wrote:
<snip>
Oh, and by the way. We've got boxed version of py.test (--box or sth, read --help), which forks for every test, so you're sure that every test is run in separate environment. (That doesn't solve your problem, but helps the other way around, when tests are run ok only when they're together). Cheers, fijal :.
Awesome, thanks a lot. Let's do it soon. There are more things - Kirill (another developer of SymPy) implemented a new feature: http://code.google.com/p/sympy/issues/detail?id=389 the relevant file is here: http://sympy.googlecode.com/svn/trunk/sympy/utilities/pytest.py and we would like to get it integrated to py.test. It works really well for us in the serial mode, but it doesn't yet work in the "py.test -d" mode. And now I found it doesn't work in the boxed mode, so it still needs some work. Another problem: "py.test -w" doesn't work in Debian, I reported a bug long time ago: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=434226 But got no response. I am CCing to the Debian maintainer - if you don't have time to maintain the package, I can take over it, I have several python related packages in Debian already and I need the py.test to work correctly if possible. Another problem with "py.test -d" is that it fails for sympy, but "py.test" only works. This is related to my last email, so by implementing bisect at least in serial mode should help me to trace all those nasty bugs in sympy down. One nice feature would be for py.test to remember the order of tests in "py.test -d" - and if it fails, but succeeds in the serial mode, it should automatically bisect and tell me - hey, this test works fine, but if executed just after that test, it fails. Ondrej On 10/18/07, Maciek Fijalkowski <fijal@genesilico.pl> wrote:
Ondrej Certik wrote:
Hi,
we are using py.test in SymPy and unfortunately, we have some bugs in SymPy, that occurs when certain tests are run and don't occur when the tests are run in different order. It's very time consuming to determine which tests cause the problem (I did that several times by hand). It occured to me that it should be possible to enhance py.test by this facility to do it automatically.
Example:
1) This passes:
$ py.test sympy/series/tests/test_series.py -k issue409
This doesn't
2) $ py.test
One problem is, that 2) runs for several minutes, another problem is that I need to specify all tests on the command line and then deleting some of them it see if it still fails, until I narrow the issue down, usually quite nicely. Then I need to play with the "-k" parameter to try different test cases in the file, until I determine the minimal set of tests that, when executed in order, produce the error.
This can by done automatically - the py.test will be given a set of files (or just tests) that pass and another set that fails and it will narrow the problem down, by bisecting.
I would like to implement this in py.test. So I'll try my best and send you a patch. If you have some ideas, that could help me, I am interested.
Thanks a lot, Ondrej Hi Ondrej.
Sorry for late reply. This mail should go to py-dev instead or at least also. This would cause faster reply (still my fault, I was offline for a while :)
Anyway - very nice idea. The long-term goal for py.test is to support this algorithm ie. in respect to revisions (give me revision that broke test X) etc. So if you go for implementing this, would be very cool for many things. Lately I didn't invest too much time in py.test developement, but I would be really happy to review your patches and/or help you. You can find help most of the time on #pylib somehow (#pypy might work as well in case noone hangs around #pylib).
Cheers, fijal
:.
On 10/18/07, Maciek Fijalkowski <fijal@genesilico.pl> wrote:
Ondrej Certik wrote:
<snip>
Oh, and by the way. We've got boxed version of py.test (--box or sth, read --help), which forks for every test, so you're sure that every test is run in separate environment. (That doesn't solve your problem, but helps the other way around, when tests are run ok only when they're together).
Cheers, fijal
:.
Awesome, thanks a lot. Let's do it soon. There are more things - Kirill (another developer of SymPy) implemented a new feature:
http://code.google.com/p/sympy/issues/detail?id=389
the relevant file is here:
http://sympy.googlecode.com/svn/trunk/sympy/utilities/pytest.py
and we would like to get it integrated to py.test. It works really well for us in the serial mode, but it doesn't yet work in the "py.test -d" mode. And now I found it doesn't work in the boxed mode, so it still needs some work.
Another problem: "py.test -w" doesn't work in Debian, I reported a bug long time ago:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=434226
But got no response. I am CCing to the Debian maintainer - if you don't have time to maintain the package, I can take over it, I have several python related packages in Debian already and I need the py.test to work correctly if possible.
Another problem with "py.test -d" is that it fails for sympy, but "py.test" only works. This is related to my last email, so by implementing bisect at least in serial mode should help me to trace all those nasty bugs in sympy down. One nice feature would be for py.test to remember the order of tests in "py.test -d" - and if it fails, but succeeds in the serial mode, it should automatically bisect and tell me - hey, this test works fine, but if executed just after that test, it fails.
Ondrej Ok, I'll take a look. Would be cool to have nice and working debian
Ondrej Certik wrote: package, but indeed we're not good in packaging. Also, we would like to do 1.0 release at some point. The main blocker is some refactorings to-be-done (some internal unification) and eventually unittest support and/or cross-platform testing (like -d, but run every test per every platform). Cheers, fijal :.
Ok, I'll take a look. Would be cool to have nice and working debian package, but indeed we're not good in packaging. Also, we would like to do 1.0 release at some point. The main blocker is some refactorings to-be-done (some internal unification) and eventually unittest support and/or cross-platform testing (like -d, but run every test per every platform).
Guilherme wrote me that he orphaned the package, which means I can adopt it. So I'll try the svn and see if what works and what doesn't and then either I can package the svn version in Debian, or (preferably) you make a new release and I'll package that. Ondrej
Ondrej Certik wrote:
Ok, I'll take a look. Would be cool to have nice and working debian package, but indeed we're not good in packaging. Also, we would like to do 1.0 release at some point. The main blocker is some refactorings to-be-done (some internal unification) and eventually unittest support and/or cross-platform testing (like -d, but run every test per every platform).
Guilherme wrote me that he orphaned the package, which means I can adopt it. So I'll try the svn and see if what works and what doesn't and then either I can package the svn version in Debian, or (preferably) you make a new release and I'll package that.
Ondrej
:.
Not sure what is in the package, but the last release is 0.9 (I think package is like 0.7). Which means most of the stuff should work. I would not like to have svn package, we can always do 0.9.1 if you find nice stuff which works in svn, but doesn't in 0.9. But having 0.9 is still an improvement :-) Also we need to wait for holger to come back (should be soon). Cheers, fijal :.
On 10/19/07, Maciek Fijalkowski <fijal@genesilico.pl> wrote:
Ondrej Certik wrote:
Ok, I'll take a look. Would be cool to have nice and working debian package, but indeed we're not good in packaging. Also, we would like to do 1.0 release at some point. The main blocker is some refactorings to-be-done (some internal unification) and eventually unittest support and/or cross-platform testing (like -d, but run every test per every platform).
Guilherme wrote me that he orphaned the package, which means I can adopt it. So I'll try the svn and see if what works and what doesn't and then either I can package the svn version in Debian, or (preferably) you make a new release and I'll package that.
Ondrej
:.
Not sure what is in the package, but the last release is 0.9 (I think package is like 0.7). Which means most of the stuff should work. I would not like to have svn package, we can always do 0.9.1 if you find nice stuff which works in svn, but doesn't in 0.9.
Yes, that's the ideal solution.
But having 0.9 is still an improvement :-) Also we need to wait for holger to come back (should be soon).
There is 0.9 in Debian, but as I wrote for example the "py.test -w" isn't working: $ wajig show python-codespeak-lib Package: python-codespeak-lib Priority: optional Section: python Installed-Size: 1680 Maintainer: Guilherme Salgado <salgado@async.com.br> Architecture: i386 Source: codespeak-lib Version: 0.9.0-3.1 Replaces: python-pylib, python2.3-codespeak-lib, python2.4-codespeak-lib Provides: python2.4-codespeak-lib, python2.5-codespeak-lib Depends: python-central (>= 0.5.8), python (<< 2.6), python (>= 2.4), libc6 (>= 2.6.1-1) Recommends: python-tkinter, python-docutils Suggests: screen, rsync, graphviz, tetex-bin, gs-gpl | gs-esp, ps2eps, subversion Conflicts: python-pylib, python2.3-codespeak-lib, python2.4-codespeak-lib Filename: pool/main/c/codespeak-lib/python-codespeak-lib_0.9.0-3.1_i386.deb Size: 352534 MD5sum: c26760c4b5faf909755b00039206f083 SHA1: d3068c24477aa9b923decdc86d815f960355628b SHA256: ff742dd60a6de29ef2aa80473f16102c932a23f1d9d0da517862853a318147fa Description: The pylib library containing py.test, greenlets and other niceties It includes py.test, whose focus is to get a test environment that's easier to use than the existing ones, py.xml ("a fast'n'easy way to generate xml/html documents"), py.magic.greenlet ("Lightweight in-process concurrent programming") and many more features. . homepage: http://codespeak.net/py/
participants (2)
-
Maciek Fijalkowski
-
Ondrej Certik