Design Philosophy: Performance vs Robustness/Maintainability
Raymond Hettinger: -----------------
One minor grumble: I think we need to give careful cost/benefit considerations to optimizations that complicate the implementation. Over the last several years, the source for Python has grown increasingly complicated. Fewer people understand it now. It is much harder to newcomers to on-ramp. The old-timers (myself included) find that their knowledge is out of date. And complexity leads to bugs (the C optimization of random number seeding caused a major bug in the 3.6.0 release; the C optimization of the lru_cache resulted in multiple releases having a hard to find threading bugs, etc.). It is becoming increasingly difficult to look at code and tell whether it is correct (I still don't fully understand the implications of the recursive constant folding in the peephole optimizer for example). In the case of this named tuple proposal, the complexity is manageable, but the overall trend isn't good and I get the feeling the aggressive optimization is causing us to forget key parts of the zen-of-python.
Nick Coughlan: -------------
As another example of this: while trading the global import lock for per-module locks eliminated most of the old import deadlocks, it turns out that it *also* left us with some fairly messy race conditions and more fragile code (I still count that particular case as a win overall, but it definitely raises the barrier to entry for maintaining that code).
Unfortunately, these are frequently cases where the benefits are immediately visible (e.g. faster benchmark results, removing longstanding limitations on user code), but the downsides can literally take years to make themselves felt (e.g. higher defect rates in the interpreter, subtle bugs in previously correct user code that are eventually traced back to interpreter changes).
Barry Warsaw: ------------
Regardless of whether [namedtuple] optimization is a good idea or not, start up time *is* a serious challenge in many environments for CPython in particular and the perception of Python’s applicability to many problems. I think we’re better off trying to identify and address such problems than ignoring or minimizing them.
Ethan Furman: ------------ Speed is not the only factor, and certainly shouldn't be the first concern, but once we have correct code we need to follow our own advice: find the bottlenecks and optimize them. Optimized code will never be as pretty or maintainable as simple, unoptimized code but real-world applications often require as much performance as can be obtained. [My apologies if I missed any points from the namedtuple thread.] -- ~Ethan~
[Python-Dev] Design Philosophy: Performance vs Robustness/Maintainability
2017-07-18 18:08 GMT+02:00 Ethan Furman
Nick Coughlan: -------------
As another example of this: while trading the global import lock for per-module locks eliminated most of the old import deadlocks, (...)
Minor remark: the email subject is inaccurate, this change is not related to performance. I would more say that it's about correctness. Python 3 doesn't hung on deadlock in "legit" import anymore ;-) Victor
On Tue, 18 Jul 2017 09:08:08 -0700
Ethan Furman
Nick Coughlan: -------------
It is "Nick Coghlan" not "Coughlan".
As another example of this: while trading the global import lock for per-module locks eliminated most of the old import deadlocks, it turns out that it *also* left us with some fairly messy race conditions and more fragile code (I still count that particular case as a win overall, but it definitely raises the barrier to entry for maintaining that code).
Unfortunately, these are frequently cases where the benefits are immediately visible (e.g. faster benchmark results, removing longstanding limitations on user code), but the downsides can literally take years to make themselves felt (e.g. higher defect rates in the interpreter, subtle bugs in previously correct user code that are eventually traced back to interpreter changes).
I'll reply here again: the original motivation for the per-module import lock was not performance but correctness. The import deadlocks were really in the category of "subtle bugs" that only occur in certain timing conditions (especially when combined with PyImport_ImportModuleNoBlock and/or stdlib modules which can try to import stuff silently, such as the codecs module). So we traded a category of "subtle bugs" due to a core design deficiency for another category of "subtle bugs" due to an imperfect implementation, the latter being actually fixable incrementally :-) Disclaimer: I wrote the initial per-module lock implementation. Regards Antoine.
On 07/18/2017 09:16 AM, Antoine Pitrou wrote:
On Tue, 18 Jul 2017 09:08:08 -0700 Ethan Furman
wrote: Nick Coughlan: -------------
It is "Nick Coghlan" not "Coughlan".
Argh. Sorry, Nick, and thank you, Antoine!
As another example of this: while trading the global import lock for per-module locks eliminated most of the old import deadlocks, it turns out that it *also* left us with some fairly messy race conditions and more fragile code (I still count that particular case as a win overall, but it definitely raises the barrier to entry for maintaining that code).
Unfortunately, these are frequently cases where the benefits are immediately visible (e.g. faster benchmark results, removing longstanding limitations on user code), but the downsides can literally take years to make themselves felt (e.g. higher defect rates in the interpreter, subtle bugs in previously correct user code that are eventually traced back to interpreter changes).
I'll reply here again: the original motivation for the per-module import lock was not performance but correctness.
I meant that as an example of the dangers of increased code complexity. -- ~Ethan~
2017-07-18 18:08 GMT+02:00 Ethan Furman
Raymond Hettinger: -----------------
And complexity leads to bugs (the C optimization of random number seeding caused a major bug in the 3.6.0 release
Hum, I guess that Raymond is referring to http://bugs.python.org/issue29085 This regression was not by an optimization at all, but a change to harden Python: https://www.python.org/dev/peps/pep-0524/ Victor
On Tue, 18 Jul 2017 at 09:07 Ethan Furman
Raymond Hettinger: -----------------
One minor grumble: I think we need to give careful cost/benefit considerations to optimizations that complicate the implementation. Over the last several years, the source for Python has grown increasingly complicated. Fewer people understand it now. It is much harder to newcomers to on-ramp. The old-timers (myself included) find that their knowledge is out of date. And complexity leads to bugs (the C optimization of random number seeding caused a major bug in the 3.6.0 release; the C optimization of the lru_cache resulted in multiple releases having a hard to find threading bugs, etc.). It is becoming increasingly difficult to look at code and tell whether it is correct (I still don't fully understand the implications of the recursive constant folding in the peephole optimizer for example). In the case of this named tuple proposal, the complexity is manageable, but the overall trend isn't good and I get the feeling the aggressive optimization is causing us to forget key parts of the zen-of-python.
Nick Coghlan: -------------
As another example of this: while trading the global import lock for per-module locks eliminated most of the old import deadlocks, it turns out that it *also* left us with some fairly messy race conditions and more fragile code (I still count that particular case as a win overall, but it definitely raises the barrier to entry for maintaining that code).
Unfortunately, these are frequently cases where the benefits are immediately visible (e.g. faster benchmark results, removing longstanding limitations on user code), but the downsides can literally take years to make themselves felt (e.g. higher defect rates in the interpreter, subtle bugs in previously correct user code that are eventually traced back to interpreter changes).
Regardless of whether [namedtuple] optimization is a good idea or not, start up time *is* a serious challenge in many environments for CPython in
Barry Warsaw: ------------ particular and
the perception of Python’s applicability to many problems. I think we’re better off trying to identify and address such problems than ignoring or minimizing them.
Ethan Furman: ------------ Speed is not the only factor, and certainly shouldn't be the first concern, but once we have correct code we need to follow our own advice: find the bottlenecks and optimize them. Optimized code will never be as pretty or maintainable as simple, unoptimized code but real-world applications often require as much performance as can be obtained.
For me it's a balance based on how critical the code is and how complicated the code will become long-term. I think between Victor and me we maybe have 1 person/week of paid work time on CPython and the rest is volunteer time, so there always has to be some consideration as to whether maintenance will become untenable long-term (this is why complex is better than complicated pretty much no matter what). In namedtuple's case, Raymond designed something that was useful with an elegant solution. Unfortunately namedtuple is a victim of its own success and became a bottleneck when it came to startup time in apps that used it extensively as well as being a sticking point for anyone who wanted to askew exec(). So now we're keeping the usefulness/API design aspect and are being pragmatic about the fact that we want to rework the elegant design to be computationally cheaper so it's no longer an obvious performance penalty at app startup for people who use it a lot. And so now the work is trying to balance the pragmatic performance aspect with the long-term maintenance aspect.
participants (4)
-
Antoine Pitrou
-
Brett Cannon
-
Ethan Furman
-
Victor Stinner