Dynamic code NOPing

For the logging module it will be extremely useful if Python included a way to disactivate processing certain blocks to avoid making sacrifices between extensive logging harness and performance. For example, instead of writing: if log.DEBUG==True: log(factorial(2**15)) It should be possible to just write: log(factorial(2**15)) if if log() is an instance of some Nopable class, the statement in log's braces is not executed. -- anatoly t.

It is possible now, use just have to move a resource consuming operations to the __str__ or __repr__ class methods and use logging.log feature, that it doesn't format string with specified format and arguments if the logging level is greater than the specified message level. For example: class Factorial: def __init__(self, n): self.n = n def calculate(self): return factorial(n) def __str__(self): return str(self.calculate) logging.debug("Factorial of %d is %s", 2**15, Factorial(2**15)) 25.12.2012 12:28, anatoly techtonik ?????:
-- Andrew

if __debug__: log(factorial(2**15)) Running python with -O will squash this statement. To have something inline, you could also abuse assert statements to do the job. def debug_log(x): log(x) return True assert debug_log(factorial(2**15)) In optimized builds, the statement will be removed entirely. On Tue, Dec 25, 2012 at 1:28 AM, anatoly techtonik <techtonik@gmail.com>wrote:
-- Jasper

Interessting alternatives, but they do not quite come on as flexible/usefull enough… Often debug statements have a lot of text and variables, like: log.debug( "The value of X, Y, Z is now: %d %s %d" % ( x, lookup(y), factorial(2**15)) It would be nice if args in log.debug() was only evaluated if debug was on. But I don't think this is possible with the current Python evaluation rules. But if debug() was indeed NOP'able, maybe it could be done ? /Rene On Dec 25, 2012, at 10:35 AM, Jakob Bowyer <jkbbwr@gmail.com> wrote:

On Tue, Dec 25, 2012 at 9:11 PM, Rene Nejsum <rene@stranden.com> wrote:
But if debug() was indeed NOP'able, maybe it could be done ?
If someone *really* wants to do this, they can abuse assert statements (which will be optimised out under "-O", just like code guarded by "if __debug__"). That doesn't make it a good idea - you most need log messages to investigate faults in production systems that you can't (or are still trying to) reproduce in development and integration environments. Compiling them out instead of deactivating them with runtime configuration settings means you can't switch them on without restarting the system with different options. This does mean that you have to factor in the cost of logging into your performance targets and hardware requirements, but the payoff is an increased ability to correctly diagnose system faults (as well as improving your ability to extract interesting metrics from log messages). Excessive logging calls certainly *can* cause performance problems due to the function call overhead, as can careless calculation of expensive values that aren't needed. One alternatives occasional noted is that you could design a logging API that can accept lazily evaluated callables instead of ordinary parameters. However, one danger of such expensive logging it that enabling that logging level becomes infeasible in practice, because the performance hit is too significant. The typical aim for logging is that your overhead should be such that enabling it in production means your servers run a little hotter, or your task takes a little longer, not that your application grinds to a halt. One good way to achieve this is to decouple the expensive calculations from the main application - you instead log the necessary pieces of information, which can be picked up by an external service and the calculation performed in a separate process (or even on a separate machine) where it won't affect the main application, and where you only calculate it if you actually need it for some reason. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I understand and agree with all your arguments on debugging. At my company we typically make some kind of backend/server control software, with a LOT of debugging lines across many modules. We have 20+ debugging flags and in different situations we enable a few of those, if we were to enable all at once it would defently have an impact on production, but hopefully just a hotter CPU and a lot of disk space being used. debug statements in our code is probably one per 10-20 lines of code. I think my main issue (and what I therefore read into the original suggestion) was the extra "if" statement at every log statement So doing: if log.debug.enabled(): log.debug( bla. bla. ) Add's 5-10% extra code lines, whereas if we could do: log.debug( bla. bla ) at the same cost would save a lot of lines. And when you have 43 lines in your editor, it will give you 3-5 lines more of real code to look at :-) /Rene On Dec 25, 2012, at 1:28 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Tue, Dec 25, 2012 at 10:42 PM, Rene Nejsum <rene@stranden.com> wrote:
Right, that's where the lazy evaluation API idea comes in where there's no choice except to do the expensive calculation in process and you want to factor out the logging level check, it's possible to replace it with 7 characters embedded in the call: debug_lazy(lambda: bla. bla.) You can also do much more sophisticated things with the logging event handling system that only trigger if an event passes the initial priority level check and gets submitted to the rest of the logging machinery. There's no magic wand we can wave to say "evaluate this immediately sometimes, but lazily other times based on some unknown global state". An API has to choose one or the other. The standard logging APIs chooses do lazy evaluation of formatting calls, but eager evaluation of the interpolated values in order to speed up the typical case of readily accessible data - that's why the active level query API is exposed. Another logging API could certainly make the other choice, adapting to the standard APIs via the level query API. I don't know if such an alternative API exists - my rule of thumb for logging calls is if something is too expensive to calculate all the time, find a way to instead pass the necessary pieces for external reconstruction to a lazy formatting call rather than making a given level of logging prohibitively expensive. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thanks, appreciate your answers and comments… OT: Being brought up with a C/Java background the depth of the Python language itself still amazes me, I wonder if there is a correlation to the Python community when guys like Guido, Nick and all others takes time to answer questions in a friendly, informative and educating way…. Merry christmas to all on the list…. /Rene On Dec 25, 2012, at 2:00 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

Rene Nejsum <rene@...> writes:
Bearing in mind that the first statement in the debug (and analogous methods) is a check for the level, the only thing you gain by having the same check outside the call is the cost of evaluating arguments. But you can also do this by passing an arbitrary class as the message object, which lazily evaluates only when needed. Contrived example: class Message(object): def __init__(self, func, x, y): # params should be cheap to evaluate self.func = func self.x = x self.y = y def __str__(self): return str(self.func(self.x**self.y)) # expense is incurred here logger.debug(Message(factorial, 2, 15)) With this setup, no if statements are needed in your code, and the expensive computations only occur when required. Regards, Vinay Sajip

On Tue, Dec 25, 2012 at 5:45 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
That's still two function calls and three assignments per logging call. Too expensive and syntax unwieldy. I think everybody agrees now that for existing CPython implementation there is really no solution for the problem of expensive logging calls vs code clarity. You have to implement optimization workaround at the cost of readability. The idea is to fix the interpreter, introducing a "feature block" - execution block that works only if it is enabled. Execution block for logging example below is defined by function name "debug" and braces (). debug( <block contents> ) debug is an object of 'feature' type, which is only executed/evaluated, if the feature is enabled in a table of features. It might be possible to implement this as a custom version of PyPy. Then by hardcoding logic for treating logging call as 'featured' should give an immediate performance boost to any project. Still it would be nice if logging was build with supported layout for easy optimization or for 'from __future__ import features.logging' .

I think we all agree that it cannot be done in Python right now… But i doubt there will be support for a solution just for debugging, and I am having a hard time coming up with other examples… Quick thought (very quick) and I am no expert, but maybe an acceptable/compatible solution could be: def do_debug(*args): print 'DEBUG: ', args def nop_debug(*args): pass # Empty function debug = do_debug debug( "Some evaluated text %d %d %d" % (1, 2, fact(22)) ) debug = nop_debug debug( "Will not be evaluated, since Python is clever enough to optimise out") At least some kind of -O option could optimise this out ? Then again, there are probably lot's of reasons for this not to work :-) /Rene On Dec 26, 2012, at 12:04 AM, anatoly techtonik <techtonik@gmail.com> wrote:

On 12/25/2012 6:04 PM, anatoly techtonik wrote:
Anatoly, do you have some measurements to justify the "too expensive" claim? Also, do you have an actual example of expensive logging? I doubt your real code is logging the factorial of 2**15. What is actually in your debug log that is expensive? It will be much easier to discuss solutions if we are talking about actual problems.
This feels both sketchy and strange, and not at all integrated with existing Python semantics. --Ned.

On Tue, Dec 25, 2012 at 11:12 PM, Ned Batchelder <ned@nedbatchelder.com>wrote:
I was thinking the same thing as I read though this thread. I'm typically logging the result of a calculation and not doing a calculation only because I'm logging. On the other hand I have used a homegrown logging system (existed well before Python's logging module) that allowed the following:
logger.warn('factorial = %s', lambda: factorial(2**15))
Instead of just outputting the string representation of the lambda the logger would evaluate the function and str() the return value. Something like this would be trivial to implement on top of Python's logging module. -- David blog: http://www.traceback.org twitter: http://twitter.com/dstanek www: http://dstanek.com

On Wed, Dec 26, 2012 at 3:12 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
Not specifically a Python logging issue, but what I periodically find in my code is that there's an "internal representation" and an "external representation" that have some sort of direct relationship. I could easily log the internal form at many points, but that's not particularly useful; logging the external involves either some hefty calculations, or perhaps a linear search of some list of possibilities (eg a reverse lookup of a constant - do you pay the cost of building up a reverse dictionary, or just do the search?). Obviously that's nothing like as expensive as 2**15!, but it makes more sense to be logging "WM_MOUSEMOVE" than "Msg 512". ChrisA

My astoptimizer provides tools to really *remove* debug at compilation, so the overhead of the debug code is just null. You can for example declare your variable project.config.DEBUG as constant with the value 0, where project.config is a module. So the if statement in "from project.config import DEBUG ... if DEBUG: ..." will be removed. See: https://bitbucket.org/haypo/astoptimizer Victor Le 25 déc. 2012 13:43, "Rene Nejsum" <rene@stranden.com> a écrit :

Victor Stinner, 30.12.2012 11:42:
How would you know at compile time that it can be removed? How do you handle the example below? Stefan ## constants.py DEBUG = False ## enable_debug.py import constants constants.DEBUG = True ## test.py import enable_debug from constants import DEBUG if DEBUG: print("DEBUGGING !")

If you mark constant.DEBUG as constant and compile your project with astoptimizer, enable_debug has no effect (if it was compiled with DEBUG=False). So only use it if DEBUG will not be changed at runtime. It cannot be used if your users might run your applucation in debug mode. To compare it to the C language, DEBUG would be a #define and astoptimizer can be see as a preprocessor. Victor Le 30 déc. 2012 11:59, "Stefan Behnel" <stefan_ml@behnel.de> a écrit :

I think the lambda: solution really is the best solution. The additional cost is the construction of one function object and one invocation per logging call, which i suspect is about the lower limit. It's also the most generally applicable: it has nothing specific to logging in it at all! So it seems to me that if we were to change anything, improving the lambdas (shorter syntax and/or optimizing away the overhead) would be the way to go over some string-interpolation-logging-specific special case in the interpreter. On Wed, Dec 26, 2012 at 5:49 AM, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:

It is possible now, use just have to move a resource consuming operations to the __str__ or __repr__ class methods and use logging.log feature, that it doesn't format string with specified format and arguments if the logging level is greater than the specified message level. For example: class Factorial: def __init__(self, n): self.n = n def calculate(self): return factorial(n) def __str__(self): return str(self.calculate) logging.debug("Factorial of %d is %s", 2**15, Factorial(2**15)) 25.12.2012 12:28, anatoly techtonik ?????:
-- Andrew

if __debug__: log(factorial(2**15)) Running python with -O will squash this statement. To have something inline, you could also abuse assert statements to do the job. def debug_log(x): log(x) return True assert debug_log(factorial(2**15)) In optimized builds, the statement will be removed entirely. On Tue, Dec 25, 2012 at 1:28 AM, anatoly techtonik <techtonik@gmail.com>wrote:
-- Jasper

Interessting alternatives, but they do not quite come on as flexible/usefull enough… Often debug statements have a lot of text and variables, like: log.debug( "The value of X, Y, Z is now: %d %s %d" % ( x, lookup(y), factorial(2**15)) It would be nice if args in log.debug() was only evaluated if debug was on. But I don't think this is possible with the current Python evaluation rules. But if debug() was indeed NOP'able, maybe it could be done ? /Rene On Dec 25, 2012, at 10:35 AM, Jakob Bowyer <jkbbwr@gmail.com> wrote:

On Tue, Dec 25, 2012 at 9:11 PM, Rene Nejsum <rene@stranden.com> wrote:
But if debug() was indeed NOP'able, maybe it could be done ?
If someone *really* wants to do this, they can abuse assert statements (which will be optimised out under "-O", just like code guarded by "if __debug__"). That doesn't make it a good idea - you most need log messages to investigate faults in production systems that you can't (or are still trying to) reproduce in development and integration environments. Compiling them out instead of deactivating them with runtime configuration settings means you can't switch them on without restarting the system with different options. This does mean that you have to factor in the cost of logging into your performance targets and hardware requirements, but the payoff is an increased ability to correctly diagnose system faults (as well as improving your ability to extract interesting metrics from log messages). Excessive logging calls certainly *can* cause performance problems due to the function call overhead, as can careless calculation of expensive values that aren't needed. One alternatives occasional noted is that you could design a logging API that can accept lazily evaluated callables instead of ordinary parameters. However, one danger of such expensive logging it that enabling that logging level becomes infeasible in practice, because the performance hit is too significant. The typical aim for logging is that your overhead should be such that enabling it in production means your servers run a little hotter, or your task takes a little longer, not that your application grinds to a halt. One good way to achieve this is to decouple the expensive calculations from the main application - you instead log the necessary pieces of information, which can be picked up by an external service and the calculation performed in a separate process (or even on a separate machine) where it won't affect the main application, and where you only calculate it if you actually need it for some reason. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I understand and agree with all your arguments on debugging. At my company we typically make some kind of backend/server control software, with a LOT of debugging lines across many modules. We have 20+ debugging flags and in different situations we enable a few of those, if we were to enable all at once it would defently have an impact on production, but hopefully just a hotter CPU and a lot of disk space being used. debug statements in our code is probably one per 10-20 lines of code. I think my main issue (and what I therefore read into the original suggestion) was the extra "if" statement at every log statement So doing: if log.debug.enabled(): log.debug( bla. bla. ) Add's 5-10% extra code lines, whereas if we could do: log.debug( bla. bla ) at the same cost would save a lot of lines. And when you have 43 lines in your editor, it will give you 3-5 lines more of real code to look at :-) /Rene On Dec 25, 2012, at 1:28 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Tue, Dec 25, 2012 at 10:42 PM, Rene Nejsum <rene@stranden.com> wrote:
Right, that's where the lazy evaluation API idea comes in where there's no choice except to do the expensive calculation in process and you want to factor out the logging level check, it's possible to replace it with 7 characters embedded in the call: debug_lazy(lambda: bla. bla.) You can also do much more sophisticated things with the logging event handling system that only trigger if an event passes the initial priority level check and gets submitted to the rest of the logging machinery. There's no magic wand we can wave to say "evaluate this immediately sometimes, but lazily other times based on some unknown global state". An API has to choose one or the other. The standard logging APIs chooses do lazy evaluation of formatting calls, but eager evaluation of the interpolated values in order to speed up the typical case of readily accessible data - that's why the active level query API is exposed. Another logging API could certainly make the other choice, adapting to the standard APIs via the level query API. I don't know if such an alternative API exists - my rule of thumb for logging calls is if something is too expensive to calculate all the time, find a way to instead pass the necessary pieces for external reconstruction to a lazy formatting call rather than making a given level of logging prohibitively expensive. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thanks, appreciate your answers and comments… OT: Being brought up with a C/Java background the depth of the Python language itself still amazes me, I wonder if there is a correlation to the Python community when guys like Guido, Nick and all others takes time to answer questions in a friendly, informative and educating way…. Merry christmas to all on the list…. /Rene On Dec 25, 2012, at 2:00 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

Rene Nejsum <rene@...> writes:
Bearing in mind that the first statement in the debug (and analogous methods) is a check for the level, the only thing you gain by having the same check outside the call is the cost of evaluating arguments. But you can also do this by passing an arbitrary class as the message object, which lazily evaluates only when needed. Contrived example: class Message(object): def __init__(self, func, x, y): # params should be cheap to evaluate self.func = func self.x = x self.y = y def __str__(self): return str(self.func(self.x**self.y)) # expense is incurred here logger.debug(Message(factorial, 2, 15)) With this setup, no if statements are needed in your code, and the expensive computations only occur when required. Regards, Vinay Sajip

On Tue, Dec 25, 2012 at 5:45 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
That's still two function calls and three assignments per logging call. Too expensive and syntax unwieldy. I think everybody agrees now that for existing CPython implementation there is really no solution for the problem of expensive logging calls vs code clarity. You have to implement optimization workaround at the cost of readability. The idea is to fix the interpreter, introducing a "feature block" - execution block that works only if it is enabled. Execution block for logging example below is defined by function name "debug" and braces (). debug( <block contents> ) debug is an object of 'feature' type, which is only executed/evaluated, if the feature is enabled in a table of features. It might be possible to implement this as a custom version of PyPy. Then by hardcoding logic for treating logging call as 'featured' should give an immediate performance boost to any project. Still it would be nice if logging was build with supported layout for easy optimization or for 'from __future__ import features.logging' .

I think we all agree that it cannot be done in Python right now… But i doubt there will be support for a solution just for debugging, and I am having a hard time coming up with other examples… Quick thought (very quick) and I am no expert, but maybe an acceptable/compatible solution could be: def do_debug(*args): print 'DEBUG: ', args def nop_debug(*args): pass # Empty function debug = do_debug debug( "Some evaluated text %d %d %d" % (1, 2, fact(22)) ) debug = nop_debug debug( "Will not be evaluated, since Python is clever enough to optimise out") At least some kind of -O option could optimise this out ? Then again, there are probably lot's of reasons for this not to work :-) /Rene On Dec 26, 2012, at 12:04 AM, anatoly techtonik <techtonik@gmail.com> wrote:

On 12/25/2012 6:04 PM, anatoly techtonik wrote:
Anatoly, do you have some measurements to justify the "too expensive" claim? Also, do you have an actual example of expensive logging? I doubt your real code is logging the factorial of 2**15. What is actually in your debug log that is expensive? It will be much easier to discuss solutions if we are talking about actual problems.
This feels both sketchy and strange, and not at all integrated with existing Python semantics. --Ned.

On Tue, Dec 25, 2012 at 11:12 PM, Ned Batchelder <ned@nedbatchelder.com>wrote:
I was thinking the same thing as I read though this thread. I'm typically logging the result of a calculation and not doing a calculation only because I'm logging. On the other hand I have used a homegrown logging system (existed well before Python's logging module) that allowed the following:
logger.warn('factorial = %s', lambda: factorial(2**15))
Instead of just outputting the string representation of the lambda the logger would evaluate the function and str() the return value. Something like this would be trivial to implement on top of Python's logging module. -- David blog: http://www.traceback.org twitter: http://twitter.com/dstanek www: http://dstanek.com

On Wed, Dec 26, 2012 at 3:12 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
Not specifically a Python logging issue, but what I periodically find in my code is that there's an "internal representation" and an "external representation" that have some sort of direct relationship. I could easily log the internal form at many points, but that's not particularly useful; logging the external involves either some hefty calculations, or perhaps a linear search of some list of possibilities (eg a reverse lookup of a constant - do you pay the cost of building up a reverse dictionary, or just do the search?). Obviously that's nothing like as expensive as 2**15!, but it makes more sense to be logging "WM_MOUSEMOVE" than "Msg 512". ChrisA

My astoptimizer provides tools to really *remove* debug at compilation, so the overhead of the debug code is just null. You can for example declare your variable project.config.DEBUG as constant with the value 0, where project.config is a module. So the if statement in "from project.config import DEBUG ... if DEBUG: ..." will be removed. See: https://bitbucket.org/haypo/astoptimizer Victor Le 25 déc. 2012 13:43, "Rene Nejsum" <rene@stranden.com> a écrit :

Victor Stinner, 30.12.2012 11:42:
How would you know at compile time that it can be removed? How do you handle the example below? Stefan ## constants.py DEBUG = False ## enable_debug.py import constants constants.DEBUG = True ## test.py import enable_debug from constants import DEBUG if DEBUG: print("DEBUGGING !")

If you mark constant.DEBUG as constant and compile your project with astoptimizer, enable_debug has no effect (if it was compiled with DEBUG=False). So only use it if DEBUG will not be changed at runtime. It cannot be used if your users might run your applucation in debug mode. To compare it to the C language, DEBUG would be a #define and astoptimizer can be see as a preprocessor. Victor Le 30 déc. 2012 11:59, "Stefan Behnel" <stefan_ml@behnel.de> a écrit :

I think the lambda: solution really is the best solution. The additional cost is the construction of one function object and one invocation per logging call, which i suspect is about the lower limit. It's also the most generally applicable: it has nothing specific to logging in it at all! So it seems to me that if we were to change anything, improving the lambdas (shorter syntax and/or optimizing away the overhead) would be the way to go over some string-interpolation-logging-specific special case in the interpreter. On Wed, Dec 26, 2012 at 5:49 AM, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:
participants (15)
-
anatoly techtonik
-
Andrew Grigorev
-
Chris Angelico
-
David Stanek
-
Greg Ewing
-
Haoyi Li
-
Jakob Bowyer
-
Jasper St. Pierre
-
Ned Batchelder
-
Nick Coghlan
-
Rene Nejsum
-
Richard Oudkerk
-
Stefan Behnel
-
Victor Stinner
-
Vinay Sajip