Mailman 3 Trimming the fat from "make quicktest" (was Re: I am now lost - committed, pulled, merged, what is "collapse"?) - Python-Dev

newer
git-style diffs in Rietveld

Trimming the fat from "make quicktest" (was Re: I am now lost - committed, pulled, merged, what is "collapse"?)

older
Re: [Python-Dev] [issue7284]...

Nick Coghlan

23 Mar 2011 23 Mar '11

7:49 a.m.

On Wed, Mar 23, 2011 at 9:45 PM, John Arbash Meinel <john@arbash-meinel.com> wrote:

...

I don't specifically know what is in those 340 tests, but 18min/340 = 3.2s for each test. Which is *much* longer than simple smoke tests would have to be.

The counts Barry is referring to there are actually counting test *files*, rather than individual tests. We only have 359 of those in total though (not counting those in subdirectories), so a "quicktest" that omits less than 6% of them doesn't sound particularly quick (even if it does leave out the slowest ones). We should probably do another pass and add a few more tests to the blacklist in the Makefile template (starting with test_concurrent_futures). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Show replies by date

Antoine Pitrou

23 Mar 23 Mar

8:31 a.m.

On Wed, 23 Mar 2011 22:49:39 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Wed, Mar 23, 2011 at 9:45 PM, John Arbash Meinel <john@arbash-meinel.com> wrote:

...
I don't specifically know what is in those 340 tests, but 18min/340 = 3.2s for each test. Which is *much* longer than simple smoke tests would have to be.

The counts Barry is referring to there are actually counting test *files*, rather than individual tests. We only have 359 of those in total though (not counting those in subdirectories), so a "quicktest" that omits less than 6% of them doesn't sound particularly quick (even if it does leave out the slowest ones).

We should probably do another pass and add a few more tests to the blacklist in the Makefile template (starting with test_concurrent_futures).

Does anyone use "make quicktest" for something useful? There is a reason the regression test suite has many tests... "Blacklisting" some of them sounds like a bad thing to do. Regards Antoine.

Nick Coghlan

8:44 a.m.

On Wed, Mar 23, 2011 at 11:31 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

On Wed, 23 Mar 2011 22:49:39 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:

...
On Wed, Mar 23, 2011 at 9:45 PM, John Arbash Meinel <john@arbash-meinel.com> wrote:

...
I don't specifically know what is in those 340 tests, but 18min/340 = 3.2s for each test. Which is *much* longer than simple smoke tests would have to be.

The counts Barry is referring to there are actually counting test *files*, rather than individual tests. We only have 359 of those in total though (not counting those in subdirectories), so a "quicktest" that omits less than 6% of them doesn't sound particularly quick (even if it does leave out the slowest ones).

We should probably do another pass and add a few more tests to the blacklist in the Makefile template (starting with test_concurrent_futures).

Does anyone use "make quicktest" for something useful? There is a reason the regression test suite has many tests... "Blacklisting" some of them sounds like a bad thing to do.

Oops, lost a bit too much context when I changed the thread title. This discussion started with Barry looking for a "smoke test" that would be quick enough to run that more people would be willing to use it to pick up gratuitous breakage due to a bad merge rather than leaving it for the buildbots to discover. Currently even "make quicktest" takes too long to run to be suitable for that task. Leaving out a couple more egregiously slow tests and possibly updating it to use the "-j" switch might make for a usable option. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Antoine Pitrou

8:52 a.m.

New subject: Trimming "make quicktest"

...

Oops, lost a bit too much context when I changed the thread title.

This discussion started with Barry looking for a "smoke test" that would be quick enough to run that more people would be willing to use it to pick up gratuitous breakage due to a bad merge rather than leaving it for the buildbots to discover.

Then many people will start running the "smoke test" rather than the whole suite, which will create new kinds of problems. It's IMO a bad idea. Let Barry learn about "-j" :)

...

Currently even "make quicktest" takes too long to run to be suitable for that task. Leaving out a couple more egregiously slow tests and possibly updating it to use the "-j" switch might make for a usable option.

"-j" will precisely help cover the duration of these long tests. By the way, you should use a higher "-j" number than you have CPUs, since some tests spend most of their time sleeping and waiting. "make quicktest" already skips test_io and test_socket, which test fundamental parts of Python. I would vote for removing "make quicktest" rather than promote such a questionable command. Regards Antoine.

Nick Coghlan

9:31 a.m.

New subject: Trimming "make quicktest"

On Wed, Mar 23, 2011 at 11:52 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

...
Currently even "make quicktest" takes too long to run to be suitable for that task. Leaving out a couple more egregiously slow tests and possibly updating it to use the "-j" switch might make for a usable option.

"-j" will precisely help cover the duration of these long tests. By the way, you should use a higher "-j" number than you have CPUs, since some tests spend most of their time sleeping and waiting.

"make quicktest" already skips test_io and test_socket, which test fundamental parts of Python. I would vote for removing "make quicktest" rather than promote such a questionable command.

I'd be fine with that if we change the -j default to something other than "1" (e.g. as I suggested elsewhere, the number of cores in the machine). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Antoine Pitrou

9:36 a.m.

New subject: Trimming "make quicktest"

Le jeudi 24 mars 2011 à 00:31 +1000, Nick Coghlan a écrit :

...

On Wed, Mar 23, 2011 at 11:52 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
...
Currently even "make quicktest" takes too long to run to be suitable for that task. Leaving out a couple more egregiously slow tests and possibly updating it to use the "-j" switch might make for a usable option.

"-j" will precisely help cover the duration of these long tests. By the way, you should use a higher "-j" number than you have CPUs, since some tests spend most of their time sleeping and waiting.

"make quicktest" already skips test_io and test_socket, which test fundamental parts of Python. I would vote for removing "make quicktest" rather than promote such a questionable command.

I'd be fine with that if we change the -j default to something other than "1" (e.g. as I suggested elsewhere, the number of cores in the machine).

You mean in the "-j" option itself or in "make test"?

Nick Coghlan

10:09 a.m.

New subject: Trimming "make quicktest"

On Thu, Mar 24, 2011 at 12:36 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

You mean in the "-j" option itself or in "make test"?

I was actually suggesting that -j be the *default* in regrtest itself, with an option to turn it off or force a particular number of processes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

David Bolen

1:29 p.m.

New subject: Trimming "make quicktest"

Nick Coghlan <ncoghlan@gmail.com> writes:

...

On Thu, Mar 24, 2011 at 12:36 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
You mean in the "-j" option itself or in "make test"?

I was actually suggesting that -j be the *default* in regrtest itself, with an option to turn it off or force a particular number of processes.

Just one request - if there are changes in this direction (e.g., trying to make regrtest use all cores by default), please include the ability to configure/override this for individual builders (or at least slaves) since otherwise I won't be able to disable it. I, for example, have cases where I may not automatically want all cores that regrtest happens to "see" get used, as the slave is doing other things too. Command line options to regrtest won't help since that's not something I, as a slave owner, can override for test builds. -- David

Antoine Pitrou

1:42 p.m.

New subject: Trimming "make quicktest"

On Wed, 23 Mar 2011 14:29:22 -0400 David Bolen <db3l.net@gmail.com> wrote:

...

Nick Coghlan <ncoghlan@gmail.com> writes:

...
On Thu, Mar 24, 2011 at 12:36 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
You mean in the "-j" option itself or in "make test"?

I was actually suggesting that -j be the *default* in regrtest itself, with an option to turn it off or force a particular number of processes.

Just one request - if there are changes in this direction (e.g., trying to make regrtest use all cores by default), please include the ability to configure/override this for individual builders (or at least slaves) since otherwise I won't be able to disable it.

I think "-j" should remain a manual setting. I've posted a patch to enable it automatically in "make test" for convenience, but it would be enabled for neither "-m test" nor "make buildbottest". Regards Antoine.

Michael Foord

1:51 p.m.

New subject: Trimming "make quicktest"

On 23/03/2011 18:42, Antoine Pitrou wrote:

...

On Wed, 23 Mar 2011 14:29:22 -0400 David Bolen<db3l.net@gmail.com> wrote:

...
Nick Coghlan<ncoghlan@gmail.com> writes:

...
...
You mean in the "-j" option itself or in "make test"? I was actually suggesting that -j be the *default* in regrtest itself, with an option to turn it off or force a particular number of

On Thu, Mar 24, 2011 at 12:36 AM, Antoine Pitrou<solipsis@pitrou.net> wrote: processes. Just one request - if there are changes in this direction (e.g., trying to make regrtest use all cores by default), please include the ability to configure/override this for individual builders (or at least slaves) since otherwise I won't be able to disable it. I think "-j" should remain a manual setting. I've posted a patch to enable it automatically in "make test" for convenience, but it would be enabled for neither "-m test" nor "make buildbottest".

-j doesn't pass on several of the flags to its subprocesses (e.g. warning settings I believe), so it shouldn't be the default. It's still very useful though. All the best, Michael

...

Regards

Antoine.

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...

-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

Antoine Pitrou

1:56 p.m.

New subject: Trimming "make quicktest"

Le mercredi 23 mars 2011 à 18:51 +0000, Michael Foord a écrit :

...

On 23/03/2011 18:42, Antoine Pitrou wrote:

...
On Wed, 23 Mar 2011 14:29:22 -0400 David Bolen<db3l.net@gmail.com> wrote:

...
Nick Coghlan<ncoghlan@gmail.com> writes:

...
...
You mean in the "-j" option itself or in "make test"? I was actually suggesting that -j be the *default* in regrtest itself, with an option to turn it off or force a particular number of

On Thu, Mar 24, 2011 at 12:36 AM, Antoine Pitrou<solipsis@pitrou.net> wrote: processes. Just one request - if there are changes in this direction (e.g., trying to make regrtest use all cores by default), please include the ability to configure/override this for individual builders (or at least slaves) since otherwise I won't be able to disable it. I think "-j" should remain a manual setting. I've posted a patch to enable it automatically in "make test" for convenience, but it would be enabled for neither "-m test" nor "make buildbottest".

-j doesn't pass on several of the flags to its subprocesses (e.g. warning settings I believe)

It does (should): http://hg.python.org/cpython/file/2f4865834695/Lib/test/support.py#l1375

Brett Cannon

24 Mar 24 Mar

11:58 a.m.

New subject: Trimming "make quicktest"

On Wed, Mar 23, 2011 at 11:56, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Le mercredi 23 mars 2011 à 18:51 +0000, Michael Foord a écrit :

...
On 23/03/2011 18:42, Antoine Pitrou wrote:

...
On Wed, 23 Mar 2011 14:29:22 -0400 David Bolen<db3l.net@gmail.com> wrote:

...
Nick Coghlan<ncoghlan@gmail.com> writes:

...
...
You mean in the "-j" option itself or in "make test"? I was actually suggesting that -j be the *default* in regrtest itself, with an option to turn it off or force a particular number of

On Thu, Mar 24, 2011 at 12:36 AM, Antoine Pitrou<solipsis@pitrou.net> wrote: processes. Just one request - if there are changes in this direction (e.g., trying to make regrtest use all cores by default), please include the ability to configure/override this for individual builders (or at least slaves) since otherwise I won't be able to disable it. I think "-j" should remain a manual setting. I've posted a patch to enable it automatically in "make test" for convenience, but it would be enabled for neither "-m test" nor "make buildbottest".

-j doesn't pass on several of the flags to its subprocesses (e.g. warning settings I believe)

It does (should): http://hg.python.org/cpython/file/2f4865834695/Lib/test/support.py#l1375

I fixed that at the sprints so yes, it works as expected (at least for the flags one will care about). -Brett

...

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org

Antoine Pitrou

23 Mar 23 Mar

9:58 a.m.

New subject: Trimming "make quicktest"

On Thu, 24 Mar 2011 00:31:46 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Wed, Mar 23, 2011 at 11:52 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
...
Currently even "make quicktest" takes too long to run to be suitable for that task. Leaving out a couple more egregiously slow tests and possibly updating it to use the "-j" switch might make for a usable option.

"-j" will precisely help cover the duration of these long tests. By the way, you should use a higher "-j" number than you have CPUs, since some tests spend most of their time sleeping and waiting.

"make quicktest" already skips test_io and test_socket, which test fundamental parts of Python. I would vote for removing "make quicktest" rather than promote such a questionable command.

I'd be fine with that if we change the -j default to something other than "1" (e.g. as I suggested elsewhere, the number of cores in the machine).

http://bugs.python.org/issue11651

Barry Warsaw

10:26 a.m.

New subject: Trimming "make quicktest"

On Mar 23, 2011, at 02:52 PM, Antoine Pitrou wrote:

...

Then many people will start running the "smoke test" rather than the whole suite, which will create new kinds of problems. It's IMO a bad idea. Let Barry learn about "-j" :)

Well, that's a social problem, not a technical problem. (See other messages in the thread regarding -j.)

...

...
Currently even "make quicktest" takes too long to run to be suitable for that task. Leaving out a couple more egregiously slow tests and possibly updating it to use the "-j" switch might make for a usable option.

"-j" will precisely help cover the duration of these long tests. By the way, you should use a higher "-j" number than you have CPUs, since some tests spend most of their time sleeping and waiting.

"make quicktest" already skips test_io and test_socket, which test fundamental parts of Python. I would vote for removing "make quicktest" rather than promote such a questionable command.

Better to rename it than remove it. If 'quicktest' is misleading people into running it rather than 'test' (which frankly, I doubt), then rename it 'smoketest' which seems entirely appropriate to its use case and indicates its value in the spectrum of tests: http://en.wikipedia.org/wiki/Smoketest Because this also rebuilds Python if needed, I think it's entirely appropriate for the push-race use case, where you've already extensively tested your change with a mostly up-to-date tree and now just need to quickly verify that Python won't crash and burn after your local merge. -Barry

Antoine Pitrou

11:16 a.m.

New subject: Trimming "make quicktest"

On Wed, 23 Mar 2011 11:26:13 -0400 Barry Warsaw <barry@python.org> wrote:

...

On Mar 23, 2011, at 02:52 PM, Antoine Pitrou wrote:

...
Then many people will start running the "smoke test" rather than the whole suite, which will create new kinds of problems. It's IMO a bad idea. Let Barry learn about "-j" :)

Well, that's a social problem, not a technical problem.

Isn't this whole thread about a social problem? You are complaining that the test suite is too slow, which *is* a social problem (the buildbots (mostly) don't care about runtime, for example). If we start promoting a "quicker" way of running tests, then nobody will use the normal way. I'm sorry, I'm -1 on that. There are regressions often enough on the buildbots. If you insist on that, I suggest that you also vow to take care of the buildbot fleet and individually track regressions and notify people who are responsible for them. Regards Antoine.

Barry Warsaw

11:46 a.m.

New subject: Trimming "make quicktest"

On Mar 23, 2011, at 05:16 PM, Antoine Pitrou wrote:

...

If we start promoting a "quicker" way of running tests, then nobody will use the normal way. I'm sorry, I'm -1 on that. There are regressions often enough on the buildbots.

I'm not sure it's worth continuing this thread. I've explained that I'm not promoting a quicker way of running the tests in lieu of the more thorough test suite. I'm looking to fill a very specific use case. Anyway, there's issue 11651 now too. -Barry

skip＠pobox.com

11:54 a.m.

New subject: Trimming "make quicktest"

Antoine> If we start promoting a "quicker" way of running tests, then Antoine> nobody will use the normal way. I'm sorry, I'm -1 on Antoine> that. There are regressions often enough on the buildbots. It seems I frequently disagree with Antoine about various things, but on this I am definitely in agreement with him. Skip

Barry Warsaw

8:53 a.m.

On Mar 23, 2011, at 02:31 PM, Antoine Pitrou wrote:

...

Does anyone use "make quicktest" for something useful?

Not currently. Can it be made useful? Should it be removed?

...

There is a reason the regression test suite has many tests... "Blacklisting" some of them sounds like a bad thing to do.

If 'make quicktest' were actually quick - say could run in 1/10 the current time, it could be used as a smoke test for merge-dance cases. OTOH, running some localized test for the feature or bug you're trying to land might be enough. In any case 'make quicktest' isn't really being honest with us <wink>. We should fix it or remove it. -Barry

Antoine Pitrou

9:08 a.m.

On Wed, 23 Mar 2011 09:53:37 -0400 Barry Warsaw <barry@python.org> wrote:

...

OTOH, running some localized test for the feature or bug you're trying to land might be enough.

Might indeed. Quite often, though, some change in a library affects another one (especially when we're talking about things like socket or threading). Really, people already don't run the test suite enough before committing/pushing (and ironically these same people often don't check the buildbots afterwards). I don't think we want to promote more laxism. Regards Antoine.

Nick Coghlan

9:29 a.m.

On Thu, Mar 24, 2011 at 12:08 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Really, people already don't run the test suite enough before committing/pushing (and ironically these same people often don't check the buildbots afterwards). I don't think we want to promote more laxism.

Encouraging a step up from "none" to "some" in a merge-dance case would still be an improvement. And if it encourages more pre-push testing when people aren't currently taking the time to run the full test suite (even though they're meant to), so much the better. And the quick test does exercise quite a few significant things like threading, sockets and threaded import. Entirely independent of the "make quicktest" question, it would be nice if the default behaviour of regrtest was updated to check the number of cores a machine has and default to using that many processes (leaving people to turn it down if they don't want to dedicate the whole machine to the run). I keep forgetting to include the -j4 when I run the tests manually, so the tests take nearly 4 times as long as they need to (and of course, the test targets in the make file don't use it at all, either). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Barry Warsaw

10:21 a.m.

On Mar 24, 2011, at 12:29 AM, Nick Coghlan wrote:

...

Entirely independent of the "make quicktest" question, it would be nice if the default behaviour of regrtest was updated to check the number of cores a machine has and default to using that many processes (leaving people to turn it down if they don't want to dedicate the whole machine to the run). I keep forgetting to include the -j4 when I run the tests manually, so the tests take nearly 4 times as long as they need to (and of course, the test targets in the make file don't use it at all, either).

It can't without some refactoring, and you can't set EXTRATESTOPTS=-j4 either because TESTOPTS includes -l and regrtest complains that -l and -j are incompatible. But I agree it would be nice if the test suite automatically took advantage of more cores where available. -Barry

Barry Warsaw

9:44 a.m.

In IRC Antoine suggested -j5 (note that -j is not compatible with -l so you have to override TESTOPTS not just EXTRATESTOPTS). Adding --slow here's what I get: $ make TESTOPTS="-j5 --slow" quicktest ... 10 slowest tests: test_mmap: 221.6s test_shelve: 184.4s test_posix: 156.3s test_largefile: 150.0s test_concurrent_futures: 105.0s test_fork1: 12.0s test_threading: 8.8s test_signal: 8.4s test_warnings: 8.0s test_threaded_import: 6.1s If I disable down to and including test_concurrent_futures, quicktest runs in 3m20s wall clock. *That's* what I'm talkin' 'bout! And the run time is totally reasonable without halving the test run. I don't think those 5 slowest tests would be missed in a smoke test. Any objections to adding those slowest 5 tests to the blacklist, and -j5 to quicktest for Python 3.3? -Barry

Antoine Pitrou

10:32 a.m.

On Wed, 23 Mar 2011 10:44:30 -0400 Barry Warsaw <barry@python.org> wrote:

...

In IRC Antoine suggested -j5 (note that -j is not compatible with -l so you have to override TESTOPTS not just EXTRATESTOPTS). Adding --slow here's what I get:

$ make TESTOPTS="-j5 --slow" quicktest ... 10 slowest tests: test_mmap: 221.6s test_shelve: 184.4s test_posix: 156.3s test_largefile: 150.0s test_concurrent_futures: 105.0s test_fork1: 12.0s test_threading: 8.8s test_signal: 8.4s test_warnings: 8.0s test_threaded_import: 6.1s

If I disable down to and including test_concurrent_futures, quicktest runs in 3m20s wall clock. *That's* what I'm talkin' 'bout! And the run time is totally reasonable without halving the test run. I don't think those 5 slowest tests would be missed in a smoke test.

Any objections to adding those slowest 5 tests to the blacklist, and -j5 to quicktest for Python 3.3?

For me, the same objections as to blacklisting any tests at all. If some tests are too slow, individual issues about them should be opened. Also, there may be some issues with your system. test_mmap, test_shelve, test_posix all take 1-2 seconds each here. Again, please open issues on the tracker. Regards Antoine.

skip＠pobox.com

11:34 a.m.

Barry> If I disable down to and including test_concurrent_futures, Barry> quicktest runs in 3m20s wall clock. *That's* what I'm talkin' Barry> 'bout! How do you know you didn't eliminate the most important tests? (That is, the stuff which tests the code which is currently the most flaky.) Barry> Any objections to adding those slowest 5 tests to the blacklist, Barry> and -j5 to quicktest for Python 3.3? Convince me that you haven't so horribly skewed the coverage that the result is no longer meaningful. Skip

Barry Warsaw

10:18 a.m.

On Mar 23, 2011, at 03:08 PM, Antoine Pitrou wrote:

...

On Wed, 23 Mar 2011 09:53:37 -0400 Barry Warsaw <barry@python.org> wrote:

...
OTOH, running some localized test for the feature or bug you're trying to land might be enough.

Might indeed. Quite often, though, some change in a library affects another one (especially when we're talking about things like socket or threading).

Really, people already don't run the test suite enough before committing/pushing (and ironically these same people often don't check the buildbots afterwards). I don't think we want to promote more laxism.

This is just the opposite. I'm not saying people shouldn't run the full(-ish) test suite before committing, I'm saying we should have a really fast minimal set of tests as a smoke test when dealing with push-races. -Barry

Antoine Pitrou

11:09 a.m.

On Wed, 23 Mar 2011 11:18:50 -0400 Barry Warsaw <barry@python.org> wrote:

...

This is just the opposite. I'm not saying people shouldn't run the full(-ish) test suite before committing, I'm saying we should have a really fast minimal set of tests as a smoke test when dealing with push-races.

That's completely bogus. There's no reason to believe that a push race would favour certain regressions over certain others. Again, you need the full test suite to assert that no regressions occured. (or you might as well run 10 tests at random and call it done) If you think that some tests are more significant than others (why?) then perhaps we can devise a limited test suite with these tests. But these tests should be chosen on the basis of their nature, *not* of their runtime. Regards Antoine.

Barry Warsaw

12:10 p.m.

On Mar 23, 2011, at 05:09 PM, Antoine Pitrou wrote:

...

That's completely bogus. There's no reason to believe that a push race would favour certain regressions over certain others. Again, you need the full test suite to assert that no regressions occured. (or you might as well run 10 tests at random and call it done)

If you promote the full test suite as the thing to run when resolving merge races, then I predict no one will run them, because doing so increases your chances of hitting *another* push race. This whole thread came up in the context of trying to find a quick test you could run in that case which didn't increase that race window. I think the practical effect of not having a simple, fast smoke test will be to do *less* testing when you hit the merge race, and just let the buildbots sort it all out. You'll probably win most of the time anyway. -Barry

Georg Brandl

4:07 p.m.

On 23.03.2011 18:10, Barry Warsaw wrote:

...

On Mar 23, 2011, at 05:09 PM, Antoine Pitrou wrote:

...
That's completely bogus. There's no reason to believe that a push race would favour certain regressions over certain others. Again, you need the full test suite to assert that no regressions occured. (or you might as well run 10 tests at random and call it done)

If you promote the full test suite as the thing to run when resolving merge races, then I predict no one will run them, because doing so increases your chances of hitting *another* push race. This whole thread came up in the context of trying to find a quick test you could run in that case which didn't increase that race window. I think the practical effect of not having a simple, fast smoke test will be to do *less* testing when you hit the merge race, and just let the buildbots sort it all out. You'll probably win most of the time anyway.

FWIW, +1 to this. Georg

Nick Coghlan

6:18 p.m.

On Thu, Mar 24, 2011 at 7:07 AM, Georg Brandl <g.brandl@gmx.net> wrote:

...

On 23.03.2011 18:10, Barry Warsaw wrote:

...
On Mar 23, 2011, at 05:09 PM, Antoine Pitrou wrote:

...
That's completely bogus. There's no reason to believe that a push race would favour certain regressions over certain others. Again, you need the full test suite to assert that no regressions occured. (or you might as well run 10 tests at random and call it done)

If you promote the full test suite as the thing to run when resolving merge races, then I predict no one will run them, because doing so increases your chances of hitting *another* push race. This whole thread came up in the context of trying to find a quick test you could run in that case which didn't increase that race window. I think the practical effect of not having a simple, fast smoke test will be to do *less* testing when you hit the merge race, and just let the buildbots sort it all out. You'll probably win most of the time anyway.

FWIW, +1 to this.

To make it clear as to the use case Barry and I are trying to cover here, when you get into a full push race for a bug fix, the current work flow (in practice) is to just merge/commit/push. When you multiply it by 3 branches, a useful smoke test needs to be *damn* fast (i.e. less than a minute) because you're going to be running it three times in quick succession (perhaps 3 if it applies to 2.7 as well). The alternative is *not* a full test run, but at best simply running the specific tests for whatever I'm fixing (or, more likely, not running any tests at all). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Stephen J. Turnbull

9:40 p.m.

Nick Coghlan writes:

...

(i.e. less than a minute) because you're going to be running it three times in quick succession (perhaps 3 if it applies to 2.7 as well).

Nobody says it's got to be *you* that runs the tests, just that they need to be run before pushing to public repo. Here's a simple way to deal with this: have two repo URLs, "pull-source" and "push-target". Under (currently) normal commit pace, "pull-source" and "push-target" resolve to the same physical repo, the other repo(s) sync(s) to that one in the background (or they could be cloned at need, in advance of the sprint). Sprinters clone from "push-target", non-sprinters from "pull-source". When sprinting, change "pull-source" to resolve to a mirror, and close it to commits, protecting the non-sprinters from instability. Let the sprinters have at it, and start a test process walking the commits in "push-target", testing each one. As each commit is validated, merge it to a third repo "validated" (could be a named branch, but I suspect named branches would suck for this purpose, because eventually you want these commits on the default branch or the appropriate named branch). So far, so good. Sprinters don't need to reconfigure their workspaces, they don't need to run the full test suite before merging. The main issue remaining is what if the tests fail? Now you have to fix the commit and do a merge dance. I'm not sure what the best way to do this is. My thinking if merges are used is that you *need* to 1. rewind the workspace to the busted commit or just before it (ie, keeping the changes for the busted commit) 2. fix it 3. commit, creating a new head 4. rebase children of the busted commit on the new head (*without* destroying the original branch 5. merge the original branch into the rebased branch 6. point the test process at the new tip Rationale: 1. The test process has not tested children of the busted commit, you can't just commit the fix on top -- it will most likely fail the same way for each child until reaching the fix. 3. See 1. 4. See 1 for "rebase". You want to keep the original branch because other sprinters' workspace configs know about it, and will get very confused if it disappears from the push-target repo. 5. This forces other sprinters to update before pushing, thus incorporating the fix in their work, and getting them back on the same page as the test process. 6. Commits don't know their children, so you'll have to reinitialize the test process to walk the branch backward. I think the process 1-6 is excessively complex for most contributors, and the rebase itself risks conflicts. So I'm not entirely happy with this. Another possibility would be to cherrypick commits into the "validated" repo. This might be best as the test process could do it automatically, and simply delete anything that causes a test failure or merge conflict. Sprinters whose commits don't make it will have to come back and fix them later. The test bot could tag the busted commits "FAILED". This has the disadvantage that commits could fall on the floor if sprinters are inattentive. It also loses branch structure, and if (say) the 3rd commit in a series is busted, you want to back out 1 and 2 as well. (It's not always necessary, but I don't see how the 'bot can know which is which.) And of course, step 7: when all the problems are resolved, reset both push-target and pull-source to resolve to the validated repo. Sprinters will have to re-clone. Step 8: convert the old push-target repo to hex as ASCII armor, and mail it to Barry Warsaw and Ben Finney because they might want to look at branch history.<wink> I suspect this is more or less what the Bazaar project's PQM does.

Nick Coghlan

9:22 a.m.

On Wed, Mar 23, 2011 at 11:53 PM, Barry Warsaw <barry@python.org> wrote:

...

In any case 'make quicktest' isn't really being honest with us <wink>. We should fix it or remove it.

It took about 11 minutes wall clock time for me. One thing I noticed is that it does the "run it twice to ensure the .pyc files are there the second time" trick, so we could halve the run time right there. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Jesus Cea

10:19 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 23/03/11 14:53, Barry Warsaw wrote:

...

If 'make quicktest' were actually quick - say could run in 1/10 the current time, it could be used as a smoke test for merge-dance cases. OTOH, running some localized test for the feature or bug you're trying to land might be enough.

Would be amazing if the test system could detect which files were changed and only do the tests that cover them. Not a 100% safety net, provided by the full test suite, but not a bad start... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTYq4Xplgi5GaxT1NAQL+EwP/bcaTceSpUB8xg4mRTixb9Ym1VFGG0lRK VqgAXQ3Otl2/MnxS7r79aPrC5QNv8tcgMSN1cQ1Em7ulbJvgAXZoI58sR/IUZeD5 uWiH6mYS++3jnw/J5pZsktwQxnLUELnQtEYbCB3ZaAf+9oF4/tQukIpUFivYDez3 mrUw9td7qVA= =WCso -----END PGP SIGNATURE-----

skip＠pobox.com

10:02 a.m.

Antoine> Does anyone use "make quicktest" for something useful? I don't use it at all. Antoine> There is a reason the regression test suite has many tests... Antoine> "Blacklisting" some of them sounds like a bad thing to do. +1. Eliminating tests based on the time it takes to run them suggests that the resulting smaller test run may have considerably different overall coverage quality than you might desire. Some tests (syntax, basic arithmetic, etc) probably run blazingly fast and will be fully covered by a "make nanotest", while some really important stuff (anything which forks or creates sockets) will have very poor nanotest coverage because many of its tests cases won't be run. The odds that someone breaks syntax or basic arithmetic functionality (or even changes those parts of the system) are pretty low, so repeatedly running those tests simply because they run fast gives a false sense of security. Skip

Barry Warsaw

10:51 a.m.

On Mar 23, 2011, at 10:02 AM, skip@pobox.com wrote:

...

Eliminating tests based on the time it takes to run them suggests that the resulting smaller test run may have considerably different overall coverage quality than you might desire. Some tests (syntax, basic arithmetic, etc) probably run blazingly fast and will be fully covered by a "make nanotest", while some really important stuff (anything which forks or creates sockets) will have very poor nanotest coverage because many of its tests cases won't be run. The odds that someone breaks syntax or basic arithmetic functionality (or even changes those parts of the system) are pretty low, so repeatedly running those tests simply because they run fast gives a false sense of security.

Not if you keep in mind the appropriate use case for each of the separate make test targets. -Barry

skip＠pobox.com

11:52 a.m.

Barry> Not if you keep in mind the appropriate use case for each of the Barry> separate make test targets. Programmers are lazy. They will often take the shortest path. Fix a small bug in module X which seems innocent enough, fail to recognize that it breaks module Y. Run "make smoketest" and see that all the test_X stuff passes. Don't notice that key test_Y tests are not even run, push, then head out to lunch. Come back to (hopefully) a bunch of red buildbots. Still, it would have been good to catch that problem before heading out to Buffalo Wild Wings to watch football players trip over sprinkler heads. How many of us really and truly can't wait a few minutes for the test suite to complete? Especially once Antoine (or whoever) gets -j working properly. There are plenty of things we can do: * Hang out on IRC * Update your Facebook status * Grab a cup of coffee * Read python-dev * Try out a few new bass lines you heard on Pinetop Perkins' last album (may he RIP). Skip

Ethan Furman

12:25 p.m.

skip@pobox.com wrote:

...

Barry> Not if you keep in mind the appropriate use case for each of the Barry> separate make test targets.

Programmers are lazy. They will often take the shortest path. Fix a small bug in module X which seems innocent enough, fail to recognize that it breaks module Y. Run "make smoketest" and see that all the test_X stuff passes. Don't notice that key test_Y tests are not even run, push, then head out to lunch. Come back to (hopefully) a bunch of red buildbots. Still, it would have been good to catch that problem before heading out to Buffalo Wild Wings to watch football players trip over sprinkler heads.

How many of us really and truly can't wait a few minutes for the test suite to complete? Especially once Antoine (or whoever) gets -j working properly.

I think the use-case has been lost. Think sprints and multiple push races. No one is arguing that the smoke-test should be the default, but seriously, are you willing to spend an hour or more re-running the complete suite of tests six, eight, or 12 times because of push races in a sprint? I can see losing a good portion of your sprinting day. Which tests are included in the smoketest definitely needs careful reviewing. Perhaps a better solution for sprints is to clone, have the sprinters clone from that clone, have one person responsible for running full tests, have the others push to the sub-sub-clone. ~Ethan~

Antoine Pitrou

12:24 p.m.

New subject: sprints and pushes

On Wed, 23 Mar 2011 10:25:01 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:

...

I think the use-case has been lost. Think sprints and multiple push races. No one is arguing that the smoke-test should be the default, but seriously, are you willing to spend an hour or more re-running the complete suite of tests six, eight, or 12 times because of push races in a sprint? I can see losing a good portion of your sprinting day.

Well, keep in ming hg is a *distributed* version control system. You don't have to push your changes right now. You can keep your changesets for yourself, make several of them (different bug fixes, for example), and push them (after a single merge) at the end of day. Regards Antoine.

Tres Seaver

12:58 p.m.

New subject: sprints and pushes

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/23/2011 01:24 PM, Antoine Pitrou wrote:

...

On Wed, 23 Mar 2011 10:25:01 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:

...
I think the use-case has been lost. Think sprints and multiple push races. No one is arguing that the smoke-test should be the default, but seriously, are you willing to spend an hour or more re-running the complete suite of tests six, eight, or 12 times because of push races in a sprint? I can see losing a good portion of your sprinting day.

Well, keep in ming hg is a *distributed* version control system. You don't have to push your changes right now. You can keep your changesets for yourself, make several of them (different bug fixes, for example), and push them (after a single merge) at the end of day.

That doesn't work so well at a sprint, where the point is to maximize the value of precious face-time to get stuff done *now*. Long test latencies and nearly-real-time collaboration are not friendly, as the agile folks document: http://c2.com/cgi/wiki?TestSpeed Maybe we need to chop the problem up as: - - Pure documentation changes never require running any test suite (this includes true comments in code, as well as docstrings which are not used to drive doctests or other tested output). - - "core" language changes always require running the full test suite. - - We compute an import-dependency map for the stdlib (maybe during build?), and add support for running tests of a named module and its dependents. Any non-documentation change to a stdlib module requires running this new kind of test against that module. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2KNMcACgkQ+gerLs4ltQ7yCwCfbqhYut0F4L3f92mXwB5SZZ8s qLUAoNpchNrkPPnjbXkqWDrIqW9NQdWr =tCRR -----END PGP SIGNATURE-----

Antoine Pitrou

1:13 p.m.

New subject: sprints and pushes

On Wed, 23 Mar 2011 13:58:31 -0400 Tres Seaver <tseaver@palladion.com> wrote:

...

That doesn't work so well at a sprint, where the point is to maximize the value of precious face-time to get stuff done *now*. Long test latencies and nearly-real-time collaboration are not friendly, as the agile folks document:

http://c2.com/cgi/wiki?TestSpeed

Maybe we need to chop the problem up as:

- - Pure documentation changes never require running any test suite (this includes true comments in code, as well as docstrings which are not used to drive doctests or other tested output).

- - "core" language changes always require running the full test suite.

- - We compute an import-dependency map for the stdlib (maybe during build?), and add support for running tests of a named module and its dependents. Any non-documentation change to a stdlib module requires running this new kind of test against that module.

I agree that finding ways to speedup running tests *without* reducing the necessary coverage is the right way to go. Part of that is obviously about optimizing the tests themselves (something which I and others have been doing repeatedly, including the addition of the "-j" switch). The dependency map is another idea. All this needs work and is therefore more difficult than blacklisting some "slow" tests, but is much more productive in the long run. Regards Antoine.

skip＠pobox.com

4:40 p.m.

New subject: sprints and pushes

>>> I think the use-case has been lost. Think sprints and multiple push >>> races. Tres> That doesn't work so well at a sprint, where the point is to Tres> maximize the value of precious face-time to get stuff done *now*. How about everybody pushes (without testing, or with, at most, Barry's smoke test) to a sprint-specific local repository? One or more buildbots (or similar) can just churn away running unit tests from that repository. When a problem is encountered, the people resposible are going to be right there. They don't have to slow down their mad hacking until there is a problem. Since there will be a fair amount of communication between sprinters, the odds of something bad and horribly hard to fix should be low. Pushes to the global repository from that sprint repository can happen at a more leisurely pace and be coordinated manually. Say, everybody breaks for {morning snack, lunch, dinner}. When they return from the break if the local buildbots are green you push out to cpython, then everyone starts banging on their keyboards again. Skip

Stephen J. Turnbull

8:36 p.m.

New subject: sprints and pushes

Tres Seaver writes:

...

On 03/23/2011 01:24 PM, Antoine Pitrou wrote:

...
On Wed, 23 Mar 2011 10:25:01 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:

...
I think the use-case has been lost. Think sprints and multiple push races.

I do, can't speak for others. So what? *sigh* ... read on.

...

...
Well, keep in ming hg is a *distributed* version control system. You don't have to push your changes right now.

s/push your changes right now/push your changes to the public repo/

...

That doesn't work so well at a sprint, where the point is to maximize the value of precious face-time to get stuff done *now*.

That's where the D in DVCS comes in. It's a new world, friends. All you need to do is bring a $50 wireless router to the sprint, and have some volunteer set up a shared repo for the sprinters. Then some volunteer *later* runs the tests and pilots the patches into the public repo. Where's the latency? N.B. The repo admin and test-running volunteers can be non-coders. In fact, the tests can be running concurrently (gives those non-coders an excuse to attend sprints!), but nobody need wait for the results.

...

Maybe we need to chop the problem up as:

"Violence is the last refuge of the incompetent." ObRef Asimov.<wink>

Terry Reedy

9:54 p.m.

New subject: sprints and pushes

On 3/23/2011 9:36 PM, Stephen J. Turnbull wrote:

...

That's where the D in DVCS comes in. It's a new world, friends. All you need to do is bring a $50 wireless router to the sprint, and have some volunteer set up a shared repo for the sprinters. Then some volunteer *later* runs the tests and pilots the patches into the public repo. Where's the latency?

N.B. The repo admin and test-running volunteers can be non-coders. In fact, the tests can be running concurrently (gives those non-coders an excuse to attend sprints!), but nobody need wait for the results.

If the push-target is a clone at, for instance, hg.python, the tester does not even need to be at the sprint site. Skype can be used for occasional feedback to authors. -- Terry Jan Reedy

Tres Seaver

24 Mar 24 Mar

7:46 a.m.

New subject: sprints and pushes

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/23/2011 09:36 PM, Stephen J. Turnbull wrote:

...

Tres Seaver writes:

...
On 03/23/2011 01:24 PM, Antoine Pitrou wrote:

...
On Wed, 23 Mar 2011 10:25:01 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:

...
I think the use-case has been lost. Think sprints and multiple push races.

I do, can't speak for others. So what? *sigh* ... read on.

...
...
Well, keep in ming hg is a *distributed* version control system. You don't have to push your changes right now.

s/push your changes right now/push your changes to the public repo/

...
That doesn't work so well at a sprint, where the point is to maximize the value of precious face-time to get stuff done *now*.

That's where the D in DVCS comes in. It's a new world, friends. All you need to do is bring a $50 wireless router to the sprint, and have some volunteer set up a shared repo for the sprinters. Then some volunteer *later* runs the tests and pilots the patches into the public repo. Where's the latency?

The current full test suite is punishingly expensive to run, sprint or not. Because of that fact, people will defer running it, and sometimes forget. Trying to require that people run it repeatedly during a push race is just Canute lashing the waves.

...

N.B. The repo admin and test-running volunteers can be non-coders. In fact, the tests can be running concurrently (gives those non-coders an excuse to attend sprints!), but nobody need wait for the results.

The rhythm is still broken if devlopers don't run the tests and see them pass. Async is an enemy to the process here, because it encourages poor practices.

...

...
Maybe we need to chop the problem up as:

"Violence is the last refuge of the incompetent." ObRef Asimov.<wink>

Hmm, that hardly seems appropriate, even with the wink. "Chopping" isn't violence in any normal sense of the word when applied to non-persons: Chopping up a problem is no more violent than chopping wood or onions (i.e., not at all). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2LPS0ACgkQ+gerLs4ltQ5GmACeKvwnwzbOX4jRzokwnm+B0yD3 X/AAoK3kQ9e1yq1LbUcg9ERj8LquAEHg =FEvh -----END PGP SIGNATURE-----

Antoine Pitrou

9:33 a.m.

New subject: sprints and pushes

On Thu, 24 Mar 2011 08:46:37 -0400 Tres Seaver <tseaver@palladion.com> wrote:

...

...
...
That doesn't work so well at a sprint, where the point is to maximize the value of precious face-time to get stuff done *now*.

That's where the D in DVCS comes in. It's a new world, friends. All you need to do is bring a $50 wireless router to the sprint, and have some volunteer set up a shared repo for the sprinters. Then some volunteer *later* runs the tests and pilots the patches into the public repo. Where's the latency?

The current full test suite is punishingly expensive to run, sprint or not. Because of that fact, people will defer running it, and sometimes forget. Trying to require that people run it repeatedly during a push race is just Canute lashing the waves.

Punishingly expensive? You have to remember that Python is an entire programming language with its standard library, used by millions of people. That its test suite can run on 4 minutes on a modern computer actually makes it rather "fast" IMO (and, perhaps, incomplete...). If you have a "push race", then after merging you could just re-run the tests that are affected by your changes (of course, if you did a change in the interpreter core, you probably should run the whole suite again). That's both faster and better focussed than a hypothetical "smoke test". (that assumes you did run the test suite before doing the original commit, of course) Regards Antoine.

Eli Bendersky

11:36 a.m.

New subject: sprints and pushes

On Thu, Mar 24, 2011 at 16:33, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

On Thu, 24 Mar 2011 08:46:37 -0400 Tres Seaver <tseaver@palladion.com> wrote:

...
...
...
That doesn't work so well at a sprint, where the point is to

maximize

...
...
the value of precious face-time to get stuff done *now*.

That's where the D in DVCS comes in. It's a new world, friends. All you need to do is bring a $50 wireless router to the sprint, and have some volunteer set up a shared repo for the sprinters. Then some volunteer *later* runs the tests and pilots the patches into the public repo. Where's the latency?

The current full test suite is punishingly expensive to run, sprint or not. Because of that fact, people will defer running it, and sometimes forget. Trying to require that people run it repeatedly during a push race is just Canute lashing the waves.

Punishingly expensive? You have to remember that Python is an entire programming language with its standard library, used by millions of people. That its test suite can run on 4 minutes on a modern computer actually makes it rather "fast" IMO (and, perhaps, incomplete...).

+1 Having experience running [= suffering from] multiple-hour (and sometimes weekend-long) tests for some systems, Python's test suite feels slender. Even surprisingly so. I often wonder how such a relatively short set of tests can exercise a project as big and full of functionality as Python with its whole standard library. Eli

Stephen J. Turnbull

9:51 p.m.

New subject: sprints and pushes

Tres Seaver writes:

...

...
...
...
Well, keep in ming hg is a *distributed* version control system. You don't have to push your changes right now.

...

...
...
That doesn't work so well at a sprint, where the point is to maximize the value of precious face-time to get stuff done *now*.

That's where the D in DVCS comes in. It's a new world, friends. All you need to do is bring a $50 wireless router to the sprint, and have some volunteer set up a shared repo for the sprinters. Then some volunteer *later* runs the tests and pilots the patches into the public repo. Where's the latency?

The current full test suite is punishingly expensive to run, sprint or not. Because of that fact, people will defer running it, and sometimes forget. Trying to require that people run it repeatedly during a push race is just Canute lashing the waves.

"Defer" is precisely the point of the proposal you are apparently responding to, and "people forget" is why I propose a volunteer (or if possible an automatic process, see my other post and Jesus Cea's) to run the deferred tests. I don't understand your point, unless it is this:

...

The rhythm is still broken if developers don't run the tests and see them pass. Async is an enemy to the process here, because it encourages poor practices.

Yes, but you can't have it both ways. Either "the" tests (in fact, I think people are referring to several different sets of tests by the phrase "the tests", sometimes even in the same sentence) are valuable and it's desirable that they always be run (I'm deliberately ignoring cost here), or they're not, in which case they should never be run. Now costs come back in: if the tests are valuable but too costly to run during sprints, better late than never, IMO. What am I missing? Also, there's nothing here that says that developers can't run the tests they think are relevant themselves. But shouldn't that be their choice if we are going to relax the requirement from the full test suite to some subset? After all, they're the ones closest to the problem, they should be in the best position to decide which tests are relevant.

...

...
...
Maybe we need to chop the problem up as:

"Violence is the last refuge of the incompetent." ObRef Asimov.<wink>

Hmm, that hardly seems appropriate, even with the wink. "Chopping" isn't violence in any normal sense of the word when applied to non-persons: Chopping up a problem is no more violent than chopping wood or onions (i.e., not at all).

Well, the referent of "violence" is the global nature of the proposal. I don't think that one size fits all, here. If you are going to argue for running some tests but not others after making changes, shouldn't there be a notion of relevance involved? IMO "the" tests for modules with dependents should include the tests for their dependents, for example. Modules that are leaves in the dependency tree presumably can be unit tested and leave it at that. Eg, much as I normally respect Barry's intuitions, his proposal (to remove costly tests, without reference to the possibility of missing something important) is IMHO absolutely the wrong criterion. I don't really know about Python's test suite, but in XEmacs the time- expensive tests are mostly the ones that involve timeouts and lots of system calls because they interact with the OS -- and those are precisely the finicky areas where a small change can occasion an unexpected bug. For XEmacs, those bugs also are more likely than average to be showstoppers for dependents, in the sense that until they're fixed, the dependents can't be tested at all. So the cost of *not* running those tests is relatively high, too. OTOH, there are a couple of expensive "sledgehammer" tests, eg, the M17N tests that iterate over *all* the characters XEmacs knows about. (Seriously, once you've seen one of the 11,000 precomposed Hangul, you've seen them all, and it's almost that bad for the 21,000 kanji.) These could easily be omitted without harming anything else. Yes, it would be a fair amount of work to do this kind of analysis. That's why I propose some sort of deferred testing as an alternative to a cost-based smoke test. Another alternative would be to require unit testing of changed modules only, which should be a pretty accurate heuristic for relevance, except for modules with lots of dependencies.

Nick Coghlan

25 Mar 25 Mar

6 a.m.

New subject: sprints and pushes

On Fri, Mar 25, 2011 at 12:51 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:

...

Eg, much as I normally respect Barry's intuitions, his proposal (to remove costly tests, without reference to the possibility of missing something important) is IMHO absolutely the wrong criterion. I don't really know about Python's test suite, but in XEmacs the time- expensive tests are mostly the ones that involve timeouts and lots of system calls because they interact with the OS -- and those are precisely the finicky areas where a small change can occasion an unexpected bug. For XEmacs, those bugs also are more likely than average to be showstoppers for dependents, in the sense that until they're fixed, the dependents can't be tested at all. So the cost of *not* running those tests is relatively high, too.

For Python, our most expensive, slow tests are generally process related or IO related (over time the threading related ones have been largely fixed to use Event based signalling rather than relying on timeouts, so they're significantly faster than they once were). These slow tests are *also* typically the most platform dependent tests, so a "green light" from a local test run is generally pretty inconclusive - you don't really find out whether you borked something until you get green lights on the buildbots for platforms other than those the patch was developed on. So I still see value in having a standard "smoke test" that runs through the platform independent stuff, to reduce the number of minor errors that needlessly cause the buildbots to go red. The idea would be to create a tiered test suite along the following lines: 1. The buildbots: run the entire (-uall) test suite across a variety of platforms. Performed for every commit pushed to hg.python.org/cpython. 2. Complete local test: run the entire (-uall) test suite on a local working copy. Recommended before first committing a fix or change to a branch. 3. Basic local test: run the test suite with no additional resources enabled on a local working copy. Current closest equivalent to a "smoke test" 4. Proposed "smoke test": quick test of platform independent code for use when merging heads after a push race 5. Specific tests: run specified tests for modules being worked on. Used during development to check fix validity and feature degree of completion. With the volume of platform dependent code we have in CPython and the standard library, the only way we're ever likely to achieve consistently green buildbots is to move to a "staging repo" model, where commits only get made to the central repo after they have already passed muster on the buildbots at least once. I think that's actually a good way for us to go in the long run, but I also think separating out a fast set of platform independent is a decent idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Tres Seaver

8:51 a.m.

New subject: sprints and pushes

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/24/2011 10:51 PM, Stephen J. Turnbull wrote:

...

If you are going to argue for running some tests but not others after making changes, shouldn't there be a notion of relevance involved? IMO "the" tests for modules with dependents should include the tests for their dependents, for example. Modules that are leaves in the dependency tree presumably can be unit tested and leave it at that.

That was precisely my proposal: when trying to check in changes to a stdlib module, we required that developers ensure that the module's tests, *and* those of its dependents, pass. We would need to add new testing infrastructure to support this expectation by computing (and saving) the dependency graph of the stdlib. I originally suggested build time for this, but now think that it would better be built during an intial full run of the suite. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2MncgACgkQ+gerLs4ltQ5WWwCfZVNtfIsPZWx6o5fC08Dh+JHV EKsAn19jQ//9TGLhMs0yCiY5zDBXNEoD =VVCq -----END PGP SIGNATURE-----

Barry Warsaw

11:05 a.m.

New subject: sprints and pushes

On Mar 25, 2011, at 09:51 AM, Tres Seaver wrote:

...

That was precisely my proposal: when trying to check in changes to a stdlib module, we required that developers ensure that the module's tests, *and* those of its dependents, pass. We would need to add new testing infrastructure to support this expectation by computing (and saving) the dependency graph of the stdlib. I originally suggested build time for this, but now think that it would better be built during an intial full run of the suite.

That does seem to be a more fruitful avenue for improvement. I'm also doing more investigation into exactly why certain tests are much slower for me than for other people. The main culprit appears to be the fact that my $HOME is on an ecryptfs, so some performance hit is expected. But 600 to 25000 times slower? Hmm... Also, something seems to be not working quite right with regrtest's cd'ing to /tmp. When I build and run the tests out of /tmp (i.e. a non-ecryptfs) with -j100 it completes in under 3 minutes. Hopefully I can investigate more later today. -Barry

Stephen J. Turnbull

26 Mar 26 Mar

1:56 a.m.

New subject: sprints and pushes

Tres Seaver writes:

...

That was precisely my proposal:

Sorry about that. I live in a disaster area, and was limited to GMail until two days ago, and lost a fair amount of context in the switch back.

Tres Seaver

12:08 p.m.

New subject: sprints and pushes

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/26/2011 02:56 AM, Stephen J. Turnbull wrote:

...

Tres Seaver writes:

...
That was precisely my proposal:

Sorry about that. I live in a disaster area, and was limited to GMail until two days ago, and lost a fair amount of context in the switch back.

I'm sorry to hear that! I hope all is well for you and yours -- we are still waiting to hear about my sister-in-law's family in the northeast. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2OHXMACgkQ+gerLs4ltQ4hPgCfes3/sHOKxWUyGgR8nsSOfhFU SwcAnjElVvSq1WKIXeMWzcMrYuMnNvaq =hbWU -----END PGP SIGNATURE-----

Simon Cross

23 Mar 23 Mar

4:27 p.m.

New subject: sprints and pushes

On Wed, Mar 23, 2011 at 7:24 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

On Wed, 23 Mar 2011 10:25:01 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:

...
I think the use-case has been lost. Think sprints and multiple push races. No one is arguing that the smoke-test should be the default, but seriously, are you willing to spend an hour or more re-running the complete suite of tests six, eight, or 12 times because of push races in a sprint? I can see losing a good portion of your sprinting day.

Our sprint model has been to set up a throw-away sprint repository somewhere accessible (github, bitbucket, wherever) and have everyone commit madly to it however they want. Afterwards a few brace souls take the result and commit it to the master repository in a more orderly fashion.

Brian Curtin

11:35 p.m.

New subject: sprints and pushes

On Wed, Mar 23, 2011 at 16:27, Simon Cross <hodgestar+pythondev@gmail.com>wrote:

...

On Wed, Mar 23, 2011 at 7:24 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
On Wed, 23 Mar 2011 10:25:01 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:

...
I think the use-case has been lost. Think sprints and multiple push races. No one is arguing that the smoke-test should be the default, but seriously, are you willing to spend an hour or more re-running the complete suite of tests six, eight, or 12 times because of push races in a sprint? I can see losing a good portion of your sprinting day.

Our sprint model has been to set up a throw-away sprint repository somewhere accessible (github, bitbucket, wherever) and have everyone commit madly to it however they want. Afterwards a few brace souls take the result and commit it to the master repository in a more orderly fashion.

While we're talking about sprints, I just wanted to put out a reminder that the PSF wants to support more of them. See www.pythonsprints.com and/or email sprints@python.org (sorry for the OT)

4791

Age (days ago)

4794

Last active (days ago)

List overview

Download

52 comments

17 participants

participants (17)

Antoine Pitrou
Barry Warsaw
Brett Cannon
Brian Curtin
David Bolen
Eli Bendersky
Ethan Furman
Georg Brandl
Jesus Cea
Michael Foord
Nick Coghlan
Simon Cross
skip＠pobox.com
Stephen J. Turnbull
Stephen J. Turnbull
Terry Reedy
Tres Seaver

Trimming the fat from "make quicktest" (was Re: I am now lost - committed, pulled, merged, what is "collapse"?)

tags

participants (17)