Disabling cyclic GC in timeit module
Hi Can we disable by default disabling the cyclic gc in timeit module? Often posts on pypy-dev or on pypy bugs contain usage of timeit module which might change the performance significantly. A good example is json benchmarks - you would rather not disable cyclic GC when running a web app, so encoding/decoding json in benchmark with the cyclic GC disabled does not make sense. What do you think? Cheers, fijal
On Fri, Oct 7, 2011 at 4:50 PM, Maciej Fijalkowski
Hi
Can we disable by default disabling the cyclic gc in timeit module? Often posts on pypy-dev or on pypy bugs contain usage of timeit module which might change the performance significantly. A good example is json benchmarks - you would rather not disable cyclic GC when running a web app, so encoding/decoding json in benchmark with the cyclic GC disabled does not make sense.
What do you think?
No, it's disabled by default for a reason (to avoid irrelevant noise in microbenchmarks), and other cases don't trump those original use cases. A command line switch to leave it enabled would probably be reasonable, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Fri, Oct 7, 2011 at 11:47 PM, Nick Coghlan
On Fri, Oct 7, 2011 at 4:50 PM, Maciej Fijalkowski
wrote: Hi
Can we disable by default disabling the cyclic gc in timeit module? Often posts on pypy-dev or on pypy bugs contain usage of timeit module which might change the performance significantly. A good example is json benchmarks - you would rather not disable cyclic GC when running a web app, so encoding/decoding json in benchmark with the cyclic GC disabled does not make sense.
What do you think?
No, it's disabled by default for a reason (to avoid irrelevant noise in microbenchmarks), and other cases don't trump those original use cases.
People don't use it only for microbenchmarks though. Also, you can't call noise a thing that adds something every now and then I think. Er. How is disabling the GC for microbenchmarks any good by the way? Cheers, fijal
On Sat, 8 Oct 2011 00:13:40 +0200
Maciej Fijalkowski
On Fri, Oct 7, 2011 at 11:47 PM, Nick Coghlan
wrote: On Fri, Oct 7, 2011 at 4:50 PM, Maciej Fijalkowski
wrote: Hi
Can we disable by default disabling the cyclic gc in timeit module? Often posts on pypy-dev or on pypy bugs contain usage of timeit module which might change the performance significantly. A good example is json benchmarks - you would rather not disable cyclic GC when running a web app, so encoding/decoding json in benchmark with the cyclic GC disabled does not make sense.
What do you think?
No, it's disabled by default for a reason (to avoid irrelevant noise in microbenchmarks), and other cases don't trump those original use cases.
People don't use it only for microbenchmarks though. Also, you can't call noise a thing that adds something every now and then I think.
Er. How is disabling the GC for microbenchmarks any good by the way?
In CPython, looking for reference cycles is a parasitic task that interferes with what you are trying to measure. It is not critical in any way, and you can schedule it much less often if it takes too much CPU, without any really adverse consequences. timeit takes the safe way and disables it completely. In PyPy, it doesn't seem gc.disable() should do anything, since you'd lose all automatic memory management if the GC was disabled. Regards Antoine.
On Sat, Oct 8, 2011 at 1:47 AM, Antoine Pitrou
On Sat, 8 Oct 2011 00:13:40 +0200 Maciej Fijalkowski
wrote: On Fri, Oct 7, 2011 at 11:47 PM, Nick Coghlan
wrote: On Fri, Oct 7, 2011 at 4:50 PM, Maciej Fijalkowski
wrote: Hi
Can we disable by default disabling the cyclic gc in timeit module? Often posts on pypy-dev or on pypy bugs contain usage of timeit module which might change the performance significantly. A good example is json benchmarks - you would rather not disable cyclic GC when running a web app, so encoding/decoding json in benchmark with the cyclic GC disabled does not make sense.
What do you think?
No, it's disabled by default for a reason (to avoid irrelevant noise in microbenchmarks), and other cases don't trump those original use cases.
People don't use it only for microbenchmarks though. Also, you can't call noise a thing that adds something every now and then I think.
Er. How is disabling the GC for microbenchmarks any good by the way?
In CPython, looking for reference cycles is a parasitic task that interferes with what you are trying to measure. It is not critical in any way, and you can schedule it much less often if it takes too much CPU, without any really adverse consequences. timeit takes the safe way and disables it completely.
In PyPy, it doesn't seem gc.disable() should do anything, since you'd lose all automatic memory management if the GC was disabled.
it disables finalizers but this is besides the point. the point is that people use timeit module to compute absolute time it takes for CPython to do things, among other things comparing it to PyPy. While I do agree that in microbenchmarks you don't loose much by just disabling it, it does affect larger applications. So answering the question like "how much time will take json encoding in my application" should take cyclic GC time into account. Cheers, fijal
In CPython, looking for reference cycles is a parasitic task that interferes with what you are trying to measure. It is not critical in any way, and you can schedule it much less often if it takes too much CPU, without any really adverse consequences. timeit takes the safe way and disables it completely.
In PyPy, it doesn't seem gc.disable() should do anything, since you'd lose all automatic memory management if the GC was disabled.
it disables finalizers but this is besides the point. the point is that people use timeit module to compute absolute time it takes for CPython to do things, among other things comparing it to PyPy. While I do agree that in microbenchmarks you don't loose much by just disabling it, it does affect larger applications. So answering the question like "how much time will take json encoding in my application" should take cyclic GC time into account.
If you are only measuring json encoding of a few select pieces of data then it's a microbenchmark. If you are measuring the whole application (or a significant part of it) then I'm not sure timeit is the right tool for that. Regards Antoine.
On Sat, Oct 8, 2011 at 2:18 AM, Antoine Pitrou
In CPython, looking for reference cycles is a parasitic task that interferes with what you are trying to measure. It is not critical in any way, and you can schedule it much less often if it takes too much CPU, without any really adverse consequences. timeit takes the safe way and disables it completely.
In PyPy, it doesn't seem gc.disable() should do anything, since you'd lose all automatic memory management if the GC was disabled.
it disables finalizers but this is besides the point. the point is that people use timeit module to compute absolute time it takes for CPython to do things, among other things comparing it to PyPy. While I do agree that in microbenchmarks you don't loose much by just disabling it, it does affect larger applications. So answering the question like "how much time will take json encoding in my application" should take cyclic GC time into account.
If you are only measuring json encoding of a few select pieces of data then it's a microbenchmark. If you are measuring the whole application (or a significant part of it) then I'm not sure timeit is the right tool for that.
Regards
Antoine.
When you're measuring how much time it takes to encode json, this is a microbenchmark and yet the time that timeit gives you is misleading, because it'll take different amount of time in your application. I guess my proposition would be to not disable gc by default and disable it when requested, but well, I guess I'll give up given the strong push against it. Cheers, fijal
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 ...
If you are only measuring json encoding of a few select pieces of data then it's a microbenchmark. If you are measuring the whole application (or a significant part of it) then I'm not sure timeit is the right tool for that.
Regards
Antoine.
When you're measuring how much time it takes to encode json, this is a microbenchmark and yet the time that timeit gives you is misleading, because it'll take different amount of time in your application. I guess my proposition would be to not disable gc by default and disable it when requested, but well, I guess I'll give up given the strong push against it.
Cheers, fijal
True, but it is also heavily dependent on how much other data your application has in memory at the time. If your application has 1M objects in memory and then goes to encode/decode a JSON string when the gc kicks in, it will take a lot longer because of all the stuff that isn't JSON related. I don't think it can be suggested that timeit should grow a flag for "put garbage into memory, and then run this microbenchmark with gc enabled.". If you really want to know how fast something is in your application, you sort of have to do the timing in your application, at scale and at load. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6dLwMACgkQJdeBCYSNAAOzzACfXpP16589Mu7W8ls9KddacF+g ozwAnRz5ciPg950qcV2uzyTKl1R21+6t =hGgf -----END PGP SIGNATURE-----
Antoine Pitrou wrote:
In CPython, looking for reference cycles is a parasitic task that interferes with what you are trying to measure. It is not critical in any way, and you can schedule it much less often if it takes too much CPU, without any really adverse consequences. timeit takes the safe way and disables it completely.
In PyPy, it doesn't seem gc.disable() should do anything, since you'd lose all automatic memory management if the GC was disabled.
it disables finalizers but this is besides the point. the point is that people use timeit module to compute absolute time it takes for CPython to do things, among other things comparing it to PyPy. While I do agree that in microbenchmarks you don't loose much by just disabling it, it does affect larger applications. So answering the question like "how much time will take json encoding in my application" should take cyclic GC time into account.
If you are only measuring json encoding of a few select pieces of data then it's a microbenchmark. If you are measuring the whole application (or a significant part of it) then I'm not sure timeit is the right tool for that.
Perhaps timeit should grow a macro-benchmark tool too? I find myself often using timeit to time macro-benchmarks simply because it's more convenient at the interactive interpreter than the alternatives. Something like this idea perhaps? http://preshing.com/20110924/timing-your-code-using-pythons-with-statement -- Steven
Perhaps timeit should grow a macro-benchmark tool too? I find myself often using timeit to time macro-benchmarks simply because it's more convenient at the interactive interpreter than the alternatives.
Something like this idea perhaps?
http://preshing.com/20110924/timing-your-code-using-pythons-with-statement
I have essentially the same snippet (with the addition of being able to provide names for timers, thus allowing to have several executing in the code and knowing which is which) lying in my toolbox for a long time now, and I find it very useful. There's also an alternative approach, having a decorator that marks a function for benchmarking. David Beazley has one good example of this here: http://www.dabeaz.com/python3io/timethis.py Eli
participants (6)
-
Antoine Pitrou
-
Eli Bendersky
-
John Arbash Meinel
-
Maciej Fijalkowski
-
Nick Coghlan
-
Steven D'Aprano