[py-dev] utest thoughts
Here's some things I'd like to do with utest. Maybe some of them are possible now. This is kind of a brainstorming list of features, I guess. * Specify tests to run within a module. The only way to select a module that I see now is by filename. Package name would also be nice. Wildcards could also be useful, e.g., utest modulename.'*transaction*'. I think regular expressions are unnecessarily complex. Maybe a wildcard character other than * would be nice, to keep it from conflicting with shell expansion. A setting, or an optional alternative character? Maybe % (like in SQL). * Data-driven tests, where the same code is tested with many different sets of data. Naturally this is often done in a for loop, but it's better if the data turns into multiple tests, each of which are addressable. There's something called a "unit" in there, I think, that relates to this...? But not the same thing as unittest; I think I saw unittest compatibility code as well. Anyway, with unittest I could provide values to the __init__, creating multiple tests that differed only according to data, but then the runner became fairly useless. I'm hoping that be easier with utest. * Specifying an option to the runner that gets passed through to the tests. It seems like the options are fixed now. I'd like to do something like -Ddatabase=mysql. I can do this with environmental variables now, but that's a little crude. It's easiest if it's just generic, like -D for compilers, but of course it would be nicer if there were specific options. Maybe this could be best achieved by specializing utest and distributing my own runner with the project. * I'm not clear how doctest would fit in. I guess I could turn the doctest into a unit test TestCase, then test that. Of course, it would be nice if this was streamlined. I also have fiddled with doctest to use my own comparison functions when testing if we get the expected output. That's not really in the scope of utest -- that should really go in doctest. Anyway, I thought I'd note its existance. * Code coverage tracking. This should be fairly straight-forward to add. The last time I looked around at test runners, Zope3's seemed the best. Well, it would have been better if I could have gotten it to do something. But it *seemed* best. Mining it for features: * Different levels of tests (-a --at-level or --all; default level is 1, which doesn't run all tests). They have lots of tests, so I'm guessing they like to avoid running tests which are unlikely to fail. * A distinction between unit and functional tests (as in acceptance or system tests). This doesn't seem very generic -- these definitions are very loose and not well agreed upon. There's not even any common language for them. I'm not sure how this fits in with level, but some sort of internal categorization of tests seems useful. * A whole build process. I think they run out of the build/ directory that distutils generates. It is a little unclear how paths work out with utest, depending on where you run it from. Setting PYTHONPATH to include your development code seems the easiest way to resolve these issues with utest. I don't have anything with complicated builds, so maybe there's issues I'm unaware of. * A pychecker option. (-c --pychecker) * A pdb option (-D --debug). I was able to add this to utest with fairly small modifications (at least, if I did it correctly). * An option to control garbage collection (-g --gc-threshold). I guess they encounter GC bugs sometimes. * Run tests in a loop (-L --loop). Also for checking memory leaks. I've thought that running loops of tests in separate threads could also be a useful test, for code that actually was supposed to be used with threads. That might be another place for specializing the runner. * Keep bytecode (-k --keepbytocode). Not interesting in itself, but it implies that they don't normally keep bytecode. I expect this is to deal with code where the .py file has been deleted, but the .pyc file is still around. I've wasted time because of that before, so I can imagine its usefulness. * Profiling (-P --profile). Displays top 50 items, by time and # of calls. * Report only first doctest failure (-1 --report-only-first-doctest-failure). * Time the tests and show the slowest 50 tests (-t --top-fifty). I first thought this was just a bad way of doing profiling, but now that I think about it this is to diagnose problems with the tests runnning slowly. That's all the interesting options, I think. There's also options to select which tests you display, but these seem too complex, while still not all that powerful.
Hi Ian, thanks a lot for your input! I will get back to each of your suggestions but this week is very busy for me so it might take a while. This shouldn't keep anyone else from replying just so you know. cheers, holger [Ian Bicking Mon, Sep 27, 2004 at 06:58:44PM -0500]
Here's some things I'd like to do with utest. Maybe some of them are possible now. This is kind of a brainstorming list of features, I guess.
* Specify tests to run within a module. The only way to select a module that I see now is by filename. Package name would also be nice. Wildcards could also be useful, e.g., utest modulename.'*transaction*'. I think regular expressions are unnecessarily complex. Maybe a wildcard character other than * would be nice, to keep it from conflicting with shell expansion. A setting, or an optional alternative character? Maybe % (like in SQL).
* Data-driven tests, where the same code is tested with many different sets of data. Naturally this is often done in a for loop, but it's better if the data turns into multiple tests, each of which are addressable. There's something called a "unit" in there, I think, that relates to this...? But not the same thing as unittest; I think I saw unittest compatibility code as well.
Anyway, with unittest I could provide values to the __init__, creating multiple tests that differed only according to data, but then the runner became fairly useless. I'm hoping that be easier with utest.
* Specifying an option to the runner that gets passed through to the tests. It seems like the options are fixed now. I'd like to do something like -Ddatabase=mysql. I can do this with environmental variables now, but that's a little crude. It's easiest if it's just generic, like -D for compilers, but of course it would be nicer if there were specific options. Maybe this could be best achieved by specializing utest and distributing my own runner with the project.
* I'm not clear how doctest would fit in. I guess I could turn the doctest into a unit test TestCase, then test that. Of course, it would be nice if this was streamlined. I also have fiddled with doctest to use my own comparison functions when testing if we get the expected output. That's not really in the scope of utest -- that should really go in doctest. Anyway, I thought I'd note its existance.
* Code coverage tracking. This should be fairly straight-forward to add.
The last time I looked around at test runners, Zope3's seemed the best. Well, it would have been better if I could have gotten it to do something. But it *seemed* best. Mining it for features:
* Different levels of tests (-a --at-level or --all; default level is 1, which doesn't run all tests). They have lots of tests, so I'm guessing they like to avoid running tests which are unlikely to fail.
* A distinction between unit and functional tests (as in acceptance or system tests). This doesn't seem very generic -- these definitions are very loose and not well agreed upon. There's not even any common language for them. I'm not sure how this fits in with level, but some sort of internal categorization of tests seems useful.
* A whole build process. I think they run out of the build/ directory that distutils generates. It is a little unclear how paths work out with utest, depending on where you run it from. Setting PYTHONPATH to include your development code seems the easiest way to resolve these issues with utest. I don't have anything with complicated builds, so maybe there's issues I'm unaware of.
* A pychecker option. (-c --pychecker)
* A pdb option (-D --debug). I was able to add this to utest with fairly small modifications (at least, if I did it correctly).
* An option to control garbage collection (-g --gc-threshold). I guess they encounter GC bugs sometimes.
* Run tests in a loop (-L --loop). Also for checking memory leaks. I've thought that running loops of tests in separate threads could also be a useful test, for code that actually was supposed to be used with threads. That might be another place for specializing the runner.
* Keep bytecode (-k --keepbytocode). Not interesting in itself, but it implies that they don't normally keep bytecode. I expect this is to deal with code where the .py file has been deleted, but the .pyc file is still around. I've wasted time because of that before, so I can imagine its usefulness.
* Profiling (-P --profile). Displays top 50 items, by time and # of calls.
* Report only first doctest failure (-1 --report-only-first-doctest-failure).
* Time the tests and show the slowest 50 tests (-t --top-fifty). I first thought this was just a bad way of doing profiling, but now that I think about it this is to diagnose problems with the tests runnning slowly.
That's all the interesting options, I think. There's also options to select which tests you display, but these seem too complex, while still not all that powerful. _______________________________________________ py-dev mailing list py-dev@codespeak.net http://codespeak.net/mailman/listinfo/py-dev
Hi Ian, hi everybody else, i have still not much time but i can't wait to reply anymore :-) please Armin, Jens-Uwe, everybody: if you feel anything is mis-represented or you have completely different ideas, or you just want to comment on it, go ahead! We may want to split the mail into different threads, though, if discussing it further. [Ian Bicking Mon, Sep 27, 2004 at 06:58:44PM -0500]
* Specify tests to run within a module. The only way to select a module that I see now is by filename. Package name would also be nice. Wildcards could also be useful, e.g., utest modulename.'*transaction*'. I think regular expressions are unnecessarily complex. Maybe a wildcard character other than * would be nice, to keep it from conflicting with shell expansion. A setting, or an optional alternative character? Maybe % (like in SQL).
Yes, selecting test methods by (maybe wildcarded) expressions seems reasonable. However, i would like to first implement an "stateful testing" approach which would basically work like this: py.test --session ... runs all tests, some of which fail ... py.test --session ... runs only the previously failed tests ... repeat the last step until all tests pass, then run all tests --> go to step 1 I somehow think that this might satisfy many use cases which otherwise would require some wildcard based selection scheme.
* Data-driven tests, where the same code is tested with many different sets of data. Naturally this is often done in a for loop, but it's better if the data turns into multiple tests, each of which are addressable. There's something called a "unit" in there, I think, that relates to this...? But not the same thing as unittest; I think I saw unittest compatibility code as well.
Anyway, with unittest I could provide values to the __init__, creating multiple tests that differed only according to data, but then the runner became fairly useless. I'm hoping that be easier with utest.
py.test consists of three main elements when "running" tests: the collector(s), the reporter (for producing all output), and the runner, driving the collector and the reporter. Collectors are completly decoupled from running tests. They are implemented (std/utest/collect.py) via a concept called "Restartable Iterators", thus the runner doesn't not need to know all the tests when starting to run them. I know that unittest.py approaches often just put all these three elements together and call it a "runner" but i think it blurs the lines and makes it very difficult to keep it extensible and customizable. Having said this, for having multiple data sets it's probably best to introduce a mechanism into the Module Collector to look for custom module-defined collectors which would seamlessly integrate into the collection process of Units. By implementing your own collection function you can yield many Units with different test data sets which are then executed/accounted for individually.
* Specifying an option to the runner that gets passed through to the tests. It seems like the options are fixed now. I'd like to do something like -Ddatabase=mysql. I can do this with environmental variables now, but that's a little crude. It's easiest if it's just generic, like -D for compilers, but of course it would be nicer if there were specific options. Maybe this could be best achieved by specializing utest and distributing my own runner with the project.
I see the point. Doing this should be done by reexaminig the current 'config' mechanism. Or better: rewriting it alltogether as it is very much ad hoc/preliminary. There is a provision to have utest configuration files currently called utest.conf where you can at the moment only set some default values for command line options. with unittest.py there is a custom habit of simply replacing the "runner" while with py.test it's most often better to write just a custom reporter or a custom collector. Eventually, the py.test-config file should allow to have specific collectors, reporters and maybe even runners for subdirectories. Architecting such a federated scheme of collectors/runners/reporters is not easy but i think very much worth it. Note, that it should always be possible to run tests of any application with py-tests by invoking 'py.test APP-DIRECTORY' or simply 'py.test' while being in the app directory. At some point we may look into providing direct support for unittest.py style tests to allow a seemless "upgrade". But this may be extremely messy with all those unittest.py hacks around.
* I'm not clear how doctest would fit in. I guess I could turn the doctest into a unit test TestCase, then test that. Of course, it would be nice if this was streamlined. I also have fiddled with doctest to use my own comparison functions when testing if we get the expected output. That's not really in the scope of utest -- that should really go in doctest. Anyway, I thought I'd note its existance.
Yes, well noted. I think everyone agrees that integrating doctest is a good idea. I am just not using them currently but i like the idea. It's open how to integrate doctests into py.test. I guess the rough idea is to to extend the current collectors to look for docstrings and they would generate specific Units whose execute() method is invoked by our runner. The DoctestUnit.execute method would run a doctest. Probably, this also requires extending the TextReporter to support some nice kind of failure output.
* Code coverage tracking. This should be fairly straight-forward to add.
yes. I guess everybody uses sys.settrace which is unfortunately rather expensive. Anyway, it would be nice to come up with some real life use cases and see what is really useful. I am not clear on that.
The last time I looked around at test runners, Zope3's seemed the best. Well, it would have been better if I could have gotten it to do something. But it *seemed* best. Mining it for features:
* Different levels of tests (-a --at-level or --all; default level is 1, which doesn't run all tests). They have lots of tests, so I'm guessing they like to avoid running tests which are unlikely to fail.
having different levels of tests seems interesting. I'd like more of a keyword based approach where all tests are associated with some keywords and you can select tests by providing keywords. Keywords could be automatically associated from filename and python name components e.g. ['std', 'path', 'local', 'test_listdir']. You could additionally associate a 'slow' keyword to some tests (somehow) and start 'py.test --exclude="slow"' or put this as a default in the configuration file.
* A distinction between unit and functional tests (as in acceptance or system tests). This doesn't seem very generic -- these definitions are very loose and not well agreed upon. There's not even any common language for them. I'm not sure how this fits in with level, but some sort of internal categorization of tests seems useful.
maybe use the keyword approach for this, too?
* A whole build process. I think they run out of the build/ directory that distutils generates. It is a little unclear how paths work out with utest, depending on where you run it from. Setting PYTHONPATH to include your development code seems the easiest way to resolve these issues with utest. I don't have anything with complicated builds, so maybe there's issues I'm unaware of.
Simply providing a distutils-install should pose no problem and we should do it, however there are a couple of interesting issues here: - armin and me want a mechanism by which to include '.c' files in the library and seemlessly compiling (and possibly caching) them via distutils-mechanism. This should allow to work with a svn-checkout containing c-coded modules without any explicit user interaction and especially without distutils-installing it. - managing the py library versions in the long run, i'd like to have an easy automated way for users of the py lib to get/install a new version. preferably via a 'py' binary which also allows to ask 'py --version', 'py --selftest' to let the py-lib tests run, 'py --test' which would iterate into all modules/packages to find tests and run them. This is kind of integrity test for your system. Also 'py someprogram.py' could start an (interactive) python interpreter allowing the script to choose/restrict it's version (on all platforms). - eventually i'd like to think some more about the notion of a 'py' application which would be installable/manageable probably connecting to the PyPI project but including downloads.
* A pychecker option. (-c --pychecker)
makes sense, although i am not using it. I have the suspicion that it would yell at py.magic :-)
* A pdb option (-D --debug). I was able to add this to utest with fairly small modifications (at least, if I did it correctly).
yes, nice, thanks.
* An option to control garbage collection (-g --gc-threshold). I guess they encounter GC bugs sometimes.
i'll let Armin comment on this :-)
* Run tests in a loop (-L --loop). Also for checking memory leaks. I've thought that running loops of tests in separate threads could also be a useful test, for code that actually was supposed to be used with threads. That might be another place for specializing the runner.
Yes, see above. Maybe 'py.test --session' could actually not return until all tests pass and wait for changing files to order to try again. This is really nice because you can just save from your editor and see the tests running (at least if you are working in multiple windows or on Xinerama like i do :-)
* Keep bytecode (-k --keepbytocode). Not interesting in itself, but it implies that they don't normally keep bytecode. I expect this is to deal with code where the .py file has been deleted, but the .pyc file is still around. I've wasted time because of that before, so I can imagine its usefulness.
yes, there are some weird problems with py.test sometimes i haven't really looked into yet. Dealing with the compiled code will even get more of an issue when we let the tests run via "execnet" (see my other post). In this case the runner might invoke multiple python interpreters and once and run tests in parallel and it would be good to not simply write .pyc files at the same places where the .py files live. Wasn't there some option to python proposed to explicitely control the location where .pyc files are created?
* Profiling (-P --profile). Displays top 50 items, by time and # of calls.
yip. Especially since the hotshot API is slightly verbose to use.
* Report only first doctest failure (-1 --report-only-first-doctest-failure).
yip.
* Time the tests and show the slowest 50 tests (-t --top-fifty). I first thought this was just a bad way of doing profiling, but now that I think about it this is to diagnose problems with the tests runnning slowly.
Yes!
That's all the interesting options, I think. There's also options to select which tests you display, but these seem too complex, while still not all that powerful.
See my idea about federated collectors / reporters / runners. If we get this right then such interesting options become very viable. OK, enough for now, i guess. Ian, thanks for coming to us and helping to move the py lib along! While i currently am the main driver i am very happy to share decisions, ideas and work, especially with knowledgable python developers. cheers, holger
I thought I'd split this up, but most of it comes down to the same subject -- how to find tests, how to annotate tests, how to select tests, and those are all kind of the same problem. Well, that and some more minor details... holger krekel wrote:
Yes, selecting test methods by (maybe wildcarded) expressions seems reasonable. However, i would like to first implement an "stateful testing" approach which would basically work like this:
py.test --session ... runs all tests, some of which fail ...
py.test --session ... runs only the previously failed tests ...
repeat the last step until all tests pass, then run all tests --> go to step 1
I somehow think that this might satisfy many use cases which otherwise would require some wildcard based selection scheme.
That would definitely be a nice feature. There are still some use cases for wildcards. One would be when you are in a large package, and make a localized change, and you don't want to spend the time to run all the tests before you get to the code you just changed; maybe you'd run all the tests later, but you want to start with specific tests and then expand once those pass. Another is TDD; in that case you may be writing many tests that are expected to fail. You probably want to address the tests in a specific order, and not filter through all the other tests at the same time.
py.test consists of three main elements when "running" tests: the collector(s), the reporter (for producing all output), and the runner, driving the collector and the reporter. Collectors are completly decoupled from running tests. They are implemented (std/utest/collect.py) via a concept called "Restartable Iterators", thus the runner doesn't not need to know all the tests when starting to run them.
I know that unittest.py approaches often just put all these three elements together and call it a "runner" but i think it blurs the lines and makes it very difficult to keep it extensible and customizable.
Having said this, for having multiple data sets it's probably best to introduce a mechanism into the Module Collector to look for custom module-defined collectors which would seamlessly integrate into the collection process of Units. By implementing your own collection function you can yield many Units with different test data sets which are then executed/accounted for individually.
Could this be like: data = [(1, 'one'), (2, 'two'), ...] def test_collector(): for args in data: # Depending on when tests are run, I think leaving out args=args # could silently make all your tests use the same data :( yield lambda args=args: test_converter(*args) # Should we then to test_collector = test_collector() ? # Maybe test_converter is a bad name, because the runner will find it... # or if test_collector is present, maybe the runner won't look any # further in this module; but sometimes we will want it to look through # the module... def test_converter(number, english): assert make_english(number) == english That would be pleasingly simple. I'm hoping we can avoid exposing complex interfaces to the test code. It gets more complicated if you can address the tests individually, e.g., via wildcards or keywords. How to add that information to the individual tests?
* Specifying an option to the runner that gets passed through to the tests. It seems like the options are fixed now. I'd like to do something like -Ddatabase=mysql. I can do this with environmental variables now, but that's a little crude. It's easiest if it's just generic, like -D for compilers, but of course it would be nicer if there were specific options. Maybe this could be best achieved by specializing utest and distributing my own runner with the project.
I see the point. Doing this should be done by reexaminig the current 'config' mechanism. Or better: rewriting it alltogether as it is very much ad hoc/preliminary. There is a provision to have utest configuration files currently called utest.conf where you can at the moment only set some default values for command line options.
with unittest.py there is a custom habit of simply replacing the "runner" while with py.test it's most often better to write just a custom reporter or a custom collector.
Eventually, the py.test-config file should allow to have specific collectors, reporters and maybe even runners for subdirectories.
Hmm... this would address certain issues. For instance, if you're doing functional tests on a web app, you might configure what URL the app is locally installed at. In the case I'm thinking of, where I run the identical set of tests on different backends, it should be available as a command-line argument. But if there's a generic command-line argument (like -D) then that could be used to set arbitrary options (assuming the config file can accept arbitrary options).
Architecting such a federated scheme of collectors/runners/reporters is not easy but i think very much worth it.
Note, that it should always be possible to run tests of any application with py-tests by invoking 'py.test APP-DIRECTORY' or simply 'py.test' while being in the app directory.
What about "python setup.py test" ? This would allow for a common way to invoke tests, regardless of runner; people could use this for their unittest-based tests as well, or whatever they are using. I think I tried this at one point, but then got bored of trying to figure out the distutils just to add this one little command.
At some point we may look into providing direct support for unittest.py style tests to allow a seemless "upgrade". But this may be extremely messy with all those unittest.py hacks around.
* I'm not clear how doctest would fit in. I guess I could turn the doctest into a unit test TestCase, then test that. Of course, it would be nice if this was streamlined. I also have fiddled with doctest to use my own comparison functions when testing if we get the expected output. That's not really in the scope of utest -- that should really go in doctest. Anyway, I thought I'd note its existance.
Yes, well noted. I think everyone agrees that integrating doctest is a good idea. I am just not using them currently but i like the idea. It's open how to integrate doctests into py.test. I guess the rough idea is to to extend the current collectors to look for docstrings and they would generate specific Units whose execute() method is invoked by our runner. The DoctestUnit.execute method would run a doctest. Probably, this also requires extending the TextReporter to support some nice kind of failure output.
In Zope3 they explicitly add doctests to the testing, it isn't just automatically picked up. In part because the doctests are typically in modules that aren't otherwise inspected for tests (they are inline with the normal code, not in seperate test_* modules). I think there may be a performance issue with inspecting all modules for doctests, and potentially an issue of finding things that look like tests but aren't (though that probably isn't a big problem, since there's ways to exclude docstrings from doctest).
* Code coverage tracking. This should be fairly straight-forward to add.
yes. I guess everybody uses sys.settrace which is unfortunately rather expensive. Anyway, it would be nice to come up with some real life use cases and see what is really useful. I am not clear on that.
I think the "50% code coverage" is mostly a feel-good measure, so you can be pleased with your increasing score as you add tests. It would be awesome to allow for leveling up with your tests. "20% code coverage; you have begun your travels" or "95% code coverage; you are approaching enlightenment". The actual file-by-file reports are more useful, I think. Coverage should only be tracked when explicitly asked for.
* Different levels of tests (-a --at-level or --all; default level is 1, which doesn't run all tests). They have lots of tests, so I'm guessing they like to avoid running tests which are unlikely to fail.
having different levels of tests seems interesting. I'd like more of a keyword based approach where all tests are associated with some keywords and you can select tests by providing keywords. Keywords could be automatically associated from filename and python name components e.g. ['std', 'path', 'local', 'test_listdir']. You could additionally associate a 'slow' keyword to some tests (somehow) and start 'py.test --exclude="slow"' or put this as a default in the configuration file.
That would work well, I think. How might tests be annotated? On a module-by-module basis, using some particular symbol (__test_keywords__)? Function attributes? On an ad hoc basis by customizing the collector? It's actually the kind of place where adaptation would be interesting; objects would be adapted to test cases, where part of the test case API was a set of keywords. That would allow for a lot of customization, while the actual tests could remain fairly simple. Part of the base of py.test would be adapters for packages, modules, and functions; the module adapter looks for the test_* functions, function adapters might look for function attributes, etc. There'd be another adapter for unittest.TestCase and unittest.TestSuite, and so on. Packages could create their own adapters for further customization.
* A distinction between unit and functional tests (as in acceptance or system tests). This doesn't seem very generic -- these definitions are very loose and not well agreed upon. There's not even any common language for them. I'm not sure how this fits in with level, but some sort of internal categorization of tests seems useful.
maybe use the keyword approach for this, too?
Seems like a good use case.
* A whole build process. I think they run out of the build/ directory that distutils generates. It is a little unclear how paths work out with utest, depending on where you run it from. Setting PYTHONPATH to include your development code seems the easiest way to resolve these issues with utest. I don't have anything with complicated builds, so maybe there's issues I'm unaware of.
Simply providing a distutils-install should pose no problem and we should do it, however there are a couple of interesting issues here:
- armin and me want a mechanism by which to include '.c' files in the library and seemlessly compiling (and possibly caching) them via distutils-mechanism. This should allow to work with a svn-checkout containing c-coded modules without any explicit user interaction and especially without distutils-installing it.
Right, this is what Zope is doing. It builds the package (but does not install it) before running the tests (python setup.py build). Then it runs the tests out of the build/ directory. The build is, I think, relatively fast (after the first time it is run); it only updates things according to timestamps.
- managing the py library versions in the long run, i'd like to have an easy automated way for users of the py lib to get/install a new version. preferably via a 'py' binary which also allows to ask 'py --version', 'py --selftest' to let the py-lib tests run, 'py --test' which would iterate into all modules/packages to find tests and run them. This is kind of integrity test for your system. Also 'py someprogram.py' could start an (interactive) python interpreter allowing the script to choose/restrict it's version (on all platforms).
- eventually i'd like to think some more about the notion of a 'py' application which would be installable/manageable probably connecting to the PyPI project but including downloads.
Is there a reason these separate concerns go together? The last seems like a distutils enhancement. Handling multiple versions... well, that's another issue that is pretty much unaddressed at this point, but I'm not sure
* Run tests in a loop (-L --loop). Also for checking memory leaks. I've thought that running loops of tests in separate threads could also be a useful test, for code that actually was supposed to be used with threads. That might be another place for specializing the runner.
Yes, see above. Maybe 'py.test --session' could actually not return until all tests pass and wait for changing files to order to try again. This is really nice because you can just save from your editor and see the tests running (at least if you are working in multiple windows or on Xinerama like i do :-)
I think for a text reporter this would lead to information overload. In Zope I assume they'd only use this once all tests passed, as a way of exercising the C code.
* Keep bytecode (-k --keepbytocode). Not interesting in itself, but it implies that they don't normally keep bytecode. I expect this is to deal with code where the .py file has been deleted, but the .pyc file is still around. I've wasted time because of that before, so I can imagine its usefulness.
yes, there are some weird problems with py.test sometimes i haven't really looked into yet. Dealing with the compiled code will even get more of an issue when we let the tests run via "execnet" (see my other post). In this case the runner might invoke multiple python interpreters and once and run tests in parallel and it would be good to not simply write .pyc files at the same places where the .py files live. Wasn't there some option to python proposed to explicitely control the location where .pyc files are created?
There was a PEP, and some (mostly positive) discussion, but it lost momentum and got lost. PEP 304, I think: http://www.python.org/peps/pep-0304.html -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
Hi Ian, hi all, [Ian Bicking]
holger krekel wrote:
Having said this, for having multiple data sets it's probably best to introduce a mechanism into the Module Collector to look for custom module-defined collectors which would seamlessly integrate into the collection process of Units. By implementing your own collection function you can yield many Units with different test data sets which are then executed/accounted for individually.
Could this be like:
data = [(1, 'one'), (2, 'two'), ...] def test_collector(): for args in data: # Depending on when tests are run, I think leaving out args=args # could silently make all your tests use the same data :( yield lambda args=args: test_converter(*args) # Should we then to test_collector = test_collector() ?
yes, though a) the collection function should not have a 'test' pre- or postfix i think. b) collectors always yield collectors or Units, not plain functions so you would actually do yield py.test.Unit(my_custom_test_function, *args) In the current py code (see my other postings) you'll find in py/test/test/data/Collector.py the current way to do custom collectors. This Collector class is instantiated with the module's location and takes over the collecting process for the module. No further attempt is made to collect anything from a module which contains a custom "Collector" object.
In the case I'm thinking of, where I run the identical set of tests on different backends, it should be available as a command-line argument. But if there's a generic command-line argument (like -D) then that could be used to set arbitrary options (assuming the config file can accept arbitrary options).
you already took '-D' for the pdb debugging but I guess we don't need "--pdb" as a short option. I took the freedom to rename it from "--usedpdb" by the way. Introducing "-D" for passing options/backends to the tests still requires more careful thoughts about test configuration (files) in general. I think it should be possible to have test-configuration files per directory, which might modify the collection process and deal with configuration data ("-D") issues. They _may_ also provide a different runner if it turns out to make sense. Again please note, that py.test's "runner" has simpler responsibilities than unittest.py - based runners. We have the runner, the collectors and the reporter.
Note, that it should always be possible to run tests of any application with py-tests by invoking 'py.test APP-DIRECTORY' or simply 'py.test' while being in the app directory.
What about "python setup.py test" ? This would allow for a common way to invoke tests, regardless of runner; people could use this for their unittest-based tests as well, or whatever they are using.
I think it's easier to say py.test APP-DIRECTORY and i don't want to deal with distutils-hacks if it can be avoided ...
I think I tried this at one point, but then got bored of trying to figure out the distutils just to add this one little command.
... we seem to share the same view here :-)
In Zope3 they explicitly add doctests to the testing, it isn't just automatically picked up.
Argh! I have to say i dislike writing repetitive unneccessary "manually-synced" boilerplate code for tests, be they unit, doc or any other kind of test.
In part because the doctests are typically in modules that aren't otherwise inspected for tests (they are inline with the normal code, not in seperate test_* modules). I think there may be a performance issue with inspecting all modules for doctests, and potentially an issue of finding things that look like tests but aren't (though that probably isn't a big problem, since there's ways to exclude docstrings from doctest).
Hum, performance problem. I think importing all test and implementation code of the py lib requires 2 seconds on a file system. As the collection happens iterative these two seconds are distributed across the whole testing time. With "Test-Session" modes it will be less. But if there is a performance problem we may think of ways to exclude files from looking for doctests by means of the per-directory py.test file.
* Code coverage tracking. This should be fairly straight-forward to add.
yes. I guess everybody uses sys.settrace which is unfortunately rather expensive. Anyway, it would be nice to come up with some real life use cases and see what is really useful. I am not clear on that.
I think the "50% code coverage" is mostly a feel-good measure, so you can be pleased with your increasing score as you add tests. It would be awesome to allow for leveling up with your tests. "20% code coverage; you have begun your travels" or "95% code coverage; you are approaching enlightenment".
yes, sounds nice.
The actual file-by-file reports are more useful, I think. Coverage should only be tracked when explicitly asked for.
Yes, i guess so. The same is true for profiling.
* Different levels of tests (-a --at-level or --all; default level is 1, which doesn't run all tests). They have lots of tests, so I'm guessing they like to avoid running tests which are unlikely to fail.
having different levels of tests seems interesting. I'd like more of a keyword based approach where all tests are associated with some keywords and you can select tests by providing keywords. Keywords could be automatically associated from filename and python name components e.g. ['std', 'path', 'local', 'test_listdir']. You could additionally associate a 'slow' keyword to some tests (somehow) and start 'py.test --exclude="slow"' or put this as a default in the configuration file.
That would work well, I think. How might tests be annotated? On a module-by-module basis, using some particular symbol (__test_keywords__)? Function attributes? On an ad hoc basis by customizing the collector?
It has to be possible to add keywords (i think we should only ever deal with adding keywords) on the module, class and method level. Finding syntax for the module level and the class level boils down to thinking about a good name which lists the additional keywords. For the method level we would have to worry about syntax so i'd like to avoid it alltogether. I think that with "automatic keywords", derived from the name of the test module, class (if any) and method name we are all set. You can always "add" a keyword by extending the testname, avoiding any redundancy. For example, we could introduce "bug" tests: def test_bug001_pypath_sucks(): # ... and then you could run a specific "bug001" test by something like py.test --include="bug001" note, btw, that i plan to distribute ".svn" directories along with every copy of 'py'. This allows _everyone_ commit access to test_*.py and *_test.py files in the repository! So if someone finds a problem or wants a guarantee from the py lib he can simply contribute it! Committing to the implementation tree will require registration for an account, though.
It's actually the kind of place where adaptation would be interesting; objects would be adapted to test cases, where part of the test case API was a set of keywords. That would allow for a lot of customization, while the actual tests could remain fairly simple. Part of the base of py.test would be adapters for packages, modules, and functions; the module adapter looks for the test_* functions, function adapters might look for function attributes, etc. There'd be another adapter for unittest.TestCase and unittest.TestSuite, and so on. Packages could create their own adapters for further customization.
I am not sure, i am following. What would be the distinct advantates of introducing this machinery?
- armin and me want a mechanism by which to include '.c' files in the library and seemlessly compiling (and possibly caching) them via distutils-mechanism. This should allow to work with a svn-checkout containing c-coded modules without any explicit user interaction and especially without distutils-installing it.
Right, this is what Zope is doing. It builds the package (but does not install it) before running the tests (python setup.py build).
We mean it a lot more automatic. You should not need to run _any_ intermediate command, no matter how simple. A checkout of the py lib should be enough. And you can simply modify your '.c' file somewhere under the implementation tree and nicely expose any objects implemented via the "package export" runtime mechanism. The motto of the py lib is "avoiding APIs" :-)
- managing the py library versions in the long run, i'd like to have an easy automated way for users of the py lib to get/install a new version. preferably via a 'py' binary which also allows to ask 'py --version', 'py --selftest' to let the py-lib tests run, 'py --test' which would iterate into all modules/packages to find tests and run them. This is kind of integrity test for your system. Also 'py someprogram.py' could start an (interactive) python interpreter allowing the script to choose/restrict it's version (on all platforms).
- eventually i'd like to think some more about the notion of a 'py' application which would be installable/manageable probably connecting to the PyPI project but including downloads.
Is there a reason these separate concerns go together? The last seems like a distutils enhancement. Handling multiple versions... well, that's another issue that is pretty much unaddressed at this point, but I'm not sure
Well, we need not discuss this further right now, i guess. I just wanted to present some of the ideas i am having. I do think that the notion of a "py app" which works well with the py-commandline utilities, is installable and upgradeable seemlessly from remote places is something to go for. It would be more than distutils and probably sitting on top of it and PyPI.
* Run tests in a loop (-L --loop). Also for checking memory leaks. I've thought that running loops of tests in separate threads could also be a useful test, for code that actually was supposed to be used with threads. That might be another place for specializing the runner.
Yes, see above. Maybe 'py.test --session' could actually not return until all tests pass and wait for changing files to order to try again. This is really nice because you can just save from your editor and see the tests running (at least if you are working in multiple windows or on Xinerama like i do :-)
I think for a text reporter this would lead to information overload.
Well, the tests would only run (and fail) when implementation files got modified. So most of the time it would be sitting idly and print nothing. I have used this scheme, it's extremely nice, and doesn't overload with information. It just avoids typing the same commands over and over. cheers, holger
holger krekel wrote:
Could this be like:
data = [(1, 'one'), (2, 'two'), ...] def test_collector(): for args in data: # Depending on when tests are run, I think leaving out args=args # could silently make all your tests use the same data :( yield lambda args=args: test_converter(*args) # Should we then to test_collector = test_collector() ?
yes, though
a) the collection function should not have a 'test' pre- or postfix i think.
Sure.
b) collectors always yield collectors or Units, not plain functions so you would actually do
yield py.test.Unit(my_custom_test_function, *args)
In the current py code (see my other postings) you'll find in py/test/test/data/Collector.py the current way to do custom collectors. This Collector class is instantiated with the module's location and takes over the collecting process for the module. No further attempt is made to collect anything from a module which contains a custom "Collector" object.
I have a problem with APIs that jump in complexity. In this case you can forget about Unit until you start needing to generate multiple tests programmatically -- your functions *were* your units, until that point. If functions are units in one place, they should be units everyplace. This is where adaptation is a potentially useful idea. Instead of asking for a specific interface for any particular object, you are only asking for an object that can be adapted to a particular interface. You do this adaptation at the last possible moment, so that something like a custom collector doesn't produce Units, but merely produces something that can be turned into a Unit (or rather, turned into something that supports the IUnit interface). Doing it this way, functions really *are* units, for all purposes. Also, you've introduced a decoupled abstraction. The tests can be any kind of object, whatever makes sense for the project -- maybe functions, maybe doctests, maybe classes, and so on. The runner iterates over unites. The collector is an adapter from packages to units, and recursively it adapts from modules to units, from functions to units, etc. But with adaptation the collector isn't something that the runner starts up, rather it is a registered service. So customization might simply mean that the runner starts by loading __init__, or test_init, or something like that. That module can register adapters which are more specialized than the standard adapters, or override the standard adapters for a subset of the package. (How to get them to apply to a subset? I'm not sure, that's not a metaphor I've seen with adaptation, though maybe multiple-value adapters apply somehow -- I'm still new at using adaptation.)
In the case I'm thinking of, where I run the identical set of tests on different backends, it should be available as a command-line argument. But if there's a generic command-line argument (like -D) then that could be used to set arbitrary options (assuming the config file can accept arbitrary options).
you already took '-D' for the pdb debugging but I guess we don't need "--pdb" as a short option. I took the freedom to rename it from "--usedpdb" by the way.
Sure. I just took the option name from Zope, but I don't really care what it is.
Introducing "-D" for passing options/backends to the tests still requires more careful thoughts about test configuration (files) in general. I think it should be possible to have test-configuration files per directory, which might modify the collection process and deal with configuration data ("-D") issues. They _may_ also provide a different runner if it turns out to make sense. Again please note, that py.test's "runner" has simpler responsibilities than unittest.py - based runners. We have the runner, the collectors and the reporter.
I know, I think I say "runner" to mean "the whole process". If we are assuming that the tests are run in serial, configuration could be done through an initialization hook. So you just put something in like: # in test_init.py... import py.test def test_setup(): py.test.options.oldusepdb = py.test.options.usepdb py.test.options.usepdb = True def test_teardown(): py.test.options.usepdb = py.test.options.oldusepdb del py.test.options.oldusepdb Kind of lame, but maybe functional.
Note, that it should always be possible to run tests of any application with py-tests by invoking 'py.test APP-DIRECTORY' or simply 'py.test' while being in the app directory.
What about "python setup.py test" ? This would allow for a common way to invoke tests, regardless of runner; people could use this for their unittest-based tests as well, or whatever they are using.
I think it's easier to say
py.test APP-DIRECTORY
and i don't want to deal with distutils-hacks if it can be avoided ...
My only concern is that py.test not be too novel; making test creation accessible is a big motivation for me. I like your idea of encouraging bug reports to be done by committing a failing test. Acknowledging that other people are going to be running their tests using unittest, or unittest with custom runners, using setup.py is a way to provide a common entry point across all Python projects.
I think I tried this at one point, but then got bored of trying to figure out the distutils just to add this one little command.
... we seem to share the same view here :-)
In Zope3 they explicitly add doctests to the testing, it isn't just automatically picked up.
Argh! I have to say i dislike writing repetitive unneccessary "manually-synced" boilerplate code for tests, be they unit, doc or any other kind of test.
Yeah... but it's not *that* bad. It would probably look like: from py.test import make_doctest # <-- not a great name import somemodule doctest_collector = make_doctest(somemodule) # or... doctest_collector = [make_doctest(somemodule.SomeClass), make_doctest(somemodule.SomeClass2)] Maybe it's not that big a deal, because practically every module is going to be imported anyway. I don't know how much time it takes to parse docstrings; heck, you could cache that parsing too, like .pyc caches the compilation. Or you could look for the __test__ variable that doctest uses; then you'd have to use __test__ = {} when you didn't have any additional tests, but that's relatively little boilerplate... though that's probably not a good compromise.
In part because the doctests are typically in modules that aren't otherwise inspected for tests (they are inline with the normal code, not in seperate test_* modules). I think there may be a performance issue with inspecting all modules for doctests, and potentially an issue of finding things that look like tests but aren't (though that probably isn't a big problem, since there's ways to exclude docstrings from doctest).
Hum, performance problem. I think importing all test and implementation code of the py lib requires 2 seconds on a file system. As the collection happens iterative these two seconds are distributed across the whole testing time. With "Test-Session" modes it will be less.
But if there is a performance problem we may think of ways to exclude files from looking for doctests by means of the per-directory py.test file.
* Different levels of tests (-a --at-level or --all; default level is 1, which doesn't run all tests). They have lots of tests, so I'm guessing they like to avoid running tests which are unlikely to fail.
having different levels of tests seems interesting. I'd like more of a keyword based approach where all tests are associated with some keywords and you can select tests by providing keywords. Keywords could be automatically associated from filename and python name components e.g. ['std', 'path', 'local', 'test_listdir']. You could additionally associate a 'slow' keyword to some tests (somehow) and start 'py.test --exclude="slow"' or put this as a default in the configuration file.
That would work well, I think. How might tests be annotated? On a module-by-module basis, using some particular symbol (__test_keywords__)? Function attributes? On an ad hoc basis by customizing the collector?
It has to be possible to add keywords (i think we should only ever deal with adding keywords) on the module, class and method level. Finding syntax for the module level and the class level boils down to thinking about a good name which lists the additional keywords.
For the method level we would have to worry about syntax so i'd like to avoid it alltogether.
Why not just attributes for all of these, e.g.: # module __test_keywords__ = ['bug001', 'pkg1'] def test_speed(): ... test_speed.__test_keywords__ = ['profile'] class SpeedTester: __test_keywords__ = ['profile'] def test_remote(self): ... test_remote.__test_keywords__ = ['urllib'] And so on. Since attributes can be added to anything, it provides a pretty easy interface.
I think that with "automatic keywords", derived from the name of the test module, class (if any) and method name we are all set. You can always "add" a keyword by extending the testname, avoiding any redundancy.
For example, we could introduce "bug" tests:
def test_bug001_pypath_sucks(): # ...
and then you could run a specific "bug001" test by something like
py.test --include="bug001"
In some ways this seems simple, but I'd expect the keywords to be fairly volatile. I'm not sure that I like that the function names would have to be as volatile as the keywords; e.g., if you add a new keyword, will you rename half your functions? And the names also have to have other parts to describe them that aren't intended to be keywords, and these could be mistaken as keywords by the collector.
note, btw, that i plan to distribute ".svn" directories along with every copy of 'py'. This allows _everyone_ commit access to test_*.py and *_test.py files in the repository! So if someone finds a problem or wants a guarantee from the py lib he can simply contribute it! Committing to the implementation tree will require registration for an account, though.
That would be pretty neat. I wonder, could you allow anonymous commit access to a subset of the tree? I guess that's not too hard with Apache. I'd rather not let anyone willy-nilly commit test cases to the core; many perceived bugs aren't real bugs, and the tests might not fit project standards. I think a bugreport/ directory, where each bug is reported as a module containing a test case and perhaps a docstring explaining expected behavior, would be perfect. They can be moved into the main tests if appropriate, or just hang out there until they are reviewed, and you would allow anonymous commits in that directory.
It's actually the kind of place where adaptation would be interesting; objects would be adapted to test cases, where part of the test case API was a set of keywords. That would allow for a lot of customization, while the actual tests could remain fairly simple. Part of the base of py.test would be adapters for packages, modules, and functions; the module adapter looks for the test_* functions, function adapters might look for function attributes, etc. There'd be another adapter for unittest.TestCase and unittest.TestSuite, and so on. Packages could create their own adapters for further customization.
I am not sure, i am following. What would be the distinct advantates of introducing this machinery?
I wrote about it more above, but the advantages would be a unified and reasonably transparent way of converting ad hoc or package-specific test cases into py.test test cases, and one that was decoupled from the tests themselves. E.g., supporting unittest.TestCase would just be a matter of providing an adapter.
- armin and me want a mechanism by which to include '.c' files in the library and seemlessly compiling (and possibly caching) them via distutils-mechanism. This should allow to work with a svn-checkout containing c-coded modules without any explicit user interaction and especially without distutils-installing it.
Right, this is what Zope is doing. It builds the package (but does not install it) before running the tests (python setup.py build).
We mean it a lot more automatic. You should not need to run _any_ intermediate command, no matter how simple. A checkout of the py lib should be enough. And you can simply modify your '.c' file somewhere under the implementation tree and nicely expose any objects implemented via the "package export" runtime mechanism.
It would be automatic -- Zope's test.py automatically runs "setup.py build" (or whatever the internal method call is to do that) before every test run, then adjusts the path accordingly so code is loaded out of build/. It should be configurable, since for pure-python packages it's not necessary or desired -- it's just a bunch of copying that doesn't serve much purpose. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
Hi Ian, i have a second thought regarding one of your suggestions/comment regarding custom test collectors ... [Ian]
[me]
b) collectors always yield collectors or Units, not plain functions so you would actually do
yield py.test.Unit(my_custom_test_function, *args)
In the current py code (see my other postings) you'll find in py/test/test/data/Collector.py the current way to do custom collectors. This Collector class is instantiated with the module's location and takes over the collecting process for the module. No further attempt is made to collect anything from a module which contains a custom "Collector" object.
I have a problem with APIs that jump in complexity. In this case you can forget about Unit until you start needing to generate multiple tests programmatically -- your functions *were* your units, until that point. If functions are units in one place, they should be units everyplace.
I agree that a jump in complexity needs to be justified and for the discussed case it's arguable: Custom collectors or generating custom tests should be easy and if possible a "no-API" thing. Today i thought that maybe it isn't such a bad idea to allow generative tests which would come close to what collectors provide. def test_generating(self): for x in someiter: yield x would yield new test methods/objects which would be wrapped in an appropriate Item-class in order to allow the reporter to uniformly access information about test items. The py.test.Driver could detect "generating tests" by checking if ``test_obj()`` returns a generator or if ``iter(test_obj)`` succeeds. This, however, would defer the decision if something is a collector aka a "generative test" or a plain test item to a very late point which leads to the the question if the distinction between collectors and test items can't be removed altogether. I think i am going to give this a try soon. Comments welcome. cheers, holger
participants (3)
-
holger krekel -
hpk@trillke.net -
Ian Bicking