PEP 651, Robust Stack Overflow Handling, Rejection notice
Hi Mark, Thank you for submitting PEP 651. The Steering Council has spent the past two weeks reviewing PEP 651. After careful consideration, we have decided to reject the PEP. The following were the key points that led us to this decision: * The benefits are not compelling enough. Deep recursion is not a common tool in Python, and even with PEP 651 it would not be efficient enough to make it a common tool. * The benefit of PEP 651 is negated as soon as a non-Python function is involved in the recursion, making the likelihood of it being useful even smaller. It also creates easy pitfalls for users who do end up relying on recursion. * We believe the PEP understates the disruption created by the technical solution of multiple Python stack frames per C call. Although this may be solvable, it will certainly cause substantial disruption to existing debuggers, tracers, and state inspection tools as they need to adapt to this change (which may not be trivial). * As the way to approach this will be platform-specific (as some parts of the proposal are not portable), this can cause generic Python code to behave differently on different platforms, making this kind of code less portable and less predictable. In the end, the benefit of the PEP does not outweigh the cost of the potential breakage, confusion, and unpredictability of the feature. With our appreciation, The Python Steering Council
Hi, Thanks for taking the time to consider the PEP. Although the PEP was rejected, I still believe that the safety guarantees in PEP 651 are worth adding to Python in the future. To do that (maybe for 3.11), I need to understand your concerns better. Would you clarify a few points for me? On 03/03/2021 7:19 pm, Python Steering Council wrote:
Hi Mark,
Thank you for submitting PEP 651. The Steering Council has spent the past two weeks reviewing PEP 651. After careful consideration, we have decided to reject the PEP. The following were the key points that led us to this decision:
* The benefits are not compelling enough. Deep recursion is not a common tool in Python, and even with PEP 651 it would not be efficient enough to make it a common tool.
* The benefit of PEP 651 is negated as soon as a non-Python function is involved in the recursion, making the likelihood of it being useful even smaller. It also creates easy pitfalls for users who do end up relying on recursion.
Could you give an example pitfall?
* We believe the PEP understates the disruption created by the technical solution of multiple Python stack frames per C call. Although this may be solvable, it will certainly cause substantial disruption to existing debuggers, tracers, and state inspection tools as they need to adapt to this change (which may not be trivial).
This is presumably the key objection. Is there a particular tool that you feel would be problematic? I have only looked at gdb and py-spy.
* As the way to approach this will be platform-specific (as some parts of the proposal are not portable), this can cause generic Python code to behave differently on different platforms, making this kind of code less portable and less predictable.
There are two issues here. Portability and changes to behaviour. Regarding portability, I have to admit that PEP is rather vague. That's my fault; I should have done more implementation first :( FWIW, I have an implementation that should be portable. https://github.com/python/cpython/compare/master...markshannon:pep-overflow-... Regarding changes to behaviour, I don't see how "generic" Python code would behave differently on different platforms, except for cases where it already does. In some cases, the PEP would have improved the situation. For example: sys.setrecursionlimit(5000) def f(): f() Currently, it raises a RecursionError on linux, but crashes the interpreter on Windows. With PEP 651 it would have raised a RecursionError on both platforms. Am I missing something here? Cheers, Mark.
On Fri, 5 Mar 2021 15:03:59 +0000 Mark Shannon <mark@hotpy.org> wrote:
There are two issues here. Portability and changes to behaviour.
Regarding portability, I have to admit that PEP is rather vague. That's my fault; I should have done more implementation first :( FWIW, I have an implementation that should be portable. https://github.com/python/cpython/compare/master...markshannon:pep-overflow-...
I looked through this diff and I'm not sure how this works robustly. This seems to assume: - each thread's stack size is known and is a compile-time constant - Python owns the thread and can compute the stack limit reliably from the current stack frame - the OS allocates stack pages in units of BLOCK_SIZE or more (currently 8kiB) - the OS doesn't use a segmentation scheme that limits stack accesses with a finer granularity than page (or "block") size - it's ok to write memory beyond the current stack top (I can imagine verification tools such as Valgrind complaining about that, though this can be worked around using suppressions) I would be curious if you could elaborate on these points. Regards Antoine.
Hi Antoine, On 05/03/2021 4:07 pm, Antoine Pitrou wrote:
On Fri, 5 Mar 2021 15:03:59 +0000 Mark Shannon <mark@hotpy.org> wrote:
There are two issues here. Portability and changes to behaviour.
Regarding portability, I have to admit that PEP is rather vague. That's my fault; I should have done more implementation first :( FWIW, I have an implementation that should be portable. https://github.com/python/cpython/compare/master...markshannon:pep-overflow-...
I looked through this diff and I'm not sure how this works robustly.
This seems to assume: - each thread's stack size is known and is a compile-time constant We just need to know that the stack size is at least the compile-time constant, we don't need to know exactly what it is.
- Python owns the thread and can compute the stack limit reliably from the current stack frame There are two points of entry into Python code. From the high level API and thread_run(). Since thread_run is called when a thread is started, it is guaranteed to have the full stack available. The high level API (PyRun_SimpleStringFlags and friends) is a bit more
- the OS allocates stack pages in units of BLOCK_SIZE or more (currently 8kiB) The BLOCK_SIZE is just a number. A larger numbers means fewer slow checks, but wastes more stack. It doesn't matter what the underlying
If the stack size is less than the compile-time constant, then a crash is a possibility. However, since PEP 651 would reduce stack consumption, a crash would have been less likely than it currently is. For all supported platforms, there is a default stack size and it is considerably larger than the chosen value of 512k. problematic. If there isn't sufficient stack space, then a crash is a possibility. But, this is no worse than the status quo, and probably better due to reduced C stack use. platform's granularity is.
- the OS doesn't use a segmentation scheme that limits stack accesses with a finer granularity than page (or "block") size I don't think that would matter.
- it's ok to write memory beyond the current stack top (I can imagine verification tools such as Valgrind complaining about that, though this can be worked around using suppressions) It's not much of a stack if you can't grow it :) I wouldn't be surprised if valgrind complains, though.
I would be curious if you could elaborate on these points.
The above scheme does require some care from users of the high-level C-API and should be tested thoroughly for new platforms. But it is a *lot* more robust than what we have now. Cheers, Mark.
Hi Mark, Le 05/03/2021 à 18:06, Mark Shannon a écrit :
Hi Antoine,
On 05/03/2021 4:07 pm, Antoine Pitrou wrote:
On Fri, 5 Mar 2021 15:03:59 +0000 Mark Shannon <mark@hotpy.org> wrote:
There are two issues here. Portability and changes to behaviour.
Regarding portability, I have to admit that PEP is rather vague. That's my fault; I should have done more implementation first :( FWIW, I have an implementation that should be portable. https://github.com/python/cpython/compare/master...markshannon:pep-overflow-...
I looked through this diff and I'm not sure how this works robustly.
This seems to assume: - each thread's stack size is known and is a compile-time constant We just need to know that the stack size is at least the compile-time constant, we don't need to know exactly what it is.
If the stack size is less than the compile-time constant, then a crash is a possibility. However, since PEP 651 would reduce stack consumption, a crash would have been less likely than it currently is.
For all supported platforms, there is a default stack size and it is considerably larger than the chosen value of 512k.
Right, but Python more or less works on a large range of non-formally supported platforms. With your proposal and a hardcoded 512 kB minimum stack size, Python may crash almost immediately on such platforms. I don't know if we care about that (we could tell third-party maintainers to change the #define to an appropriate value), but it's still something to will make porting a bit more cumbersome.
- Python owns the thread and can compute the stack limit reliably from the current stack frame There are two points of entry into Python code. From the high level API and thread_run().
Hmm, actually, there are many entry points into Python code. C code can create its own thread and call arbitrary CPython APIs from it (after having initialized the tstate using the PyGILState API). That call inside a C-created thread can happen after an arbitrary amount of stack was already consumed by the calling C code.
But, this is no worse than the status quo, and probably better due to reduced C stack use.
I'm not sure what you mean with "no worse than the statu quo". Supporting a smaller recursion limit than advertised is one thing, crashing almost immediately when executing Python code (because of trying to access memory outside beyond the reserved stack area) is another. Currently, you would only crash when recursing a lot.
- it's ok to write memory beyond the current stack top (I can imagine verification tools such as Valgrind complaining about that, though this can be worked around using suppressions) It's not much of a stack if you can't grow it :)
The stack size is typically limited (see `ulimit -s`). Regards Antoine.
Speaking for myself ... On Fri, Mar 5, 2021 at 7:04 AM Mark Shannon <mark@hotpy.org> wrote:
Hi,
Thanks for taking the time to consider the PEP.
Although the PEP was rejected, I still believe that the safety guarantees in PEP 651 are worth adding to Python in the future.
To do that (maybe for 3.11), I need to understand your concerns better.
Would you clarify a few points for me?
Hi Mark,
Thank you for submitting PEP 651. The Steering Council has spent the
On 03/03/2021 7:19 pm, Python Steering Council wrote: past two weeks reviewing PEP 651. After careful consideration, we have decided to reject the PEP. The following were the key points that led us to this decision:
* The benefits are not compelling enough. Deep recursion is not a common
tool in
Python, and even with PEP 651 it would not be efficient enough to make it a common tool.
* The benefit of PEP 651 is negated as soon as a non-Python function is involved in the recursion, making the likelihood of it being useful even smaller. It also creates easy pitfalls for users who do end up relying on recursion.
Could you give an example pitfall?
* We believe the PEP understates the disruption created by the technical
multiple Python stack frames per C call. Although this may be solvable, it will certainly cause substantial disruption to existing debuggers,
solution of tracers, and state
inspection tools as they need to adapt to this change (which may not be trivial).
This is presumably the key objection. Is there a particular tool that you feel would be problematic? I have only looked at gdb and py-spy.
There's also any debugger that has an extension module component, e.g. debugpy.
* As the way to approach this will be platform-specific (as some parts
are not portable), this can cause generic Python code to behave differently on different platforms, making this kind of code less portable and less
of the proposal predictable.
There are two issues here. Portability and changes to behaviour.
Regarding portability, I have to admit that PEP is rather vague. That's my fault; I should have done more implementation first :( FWIW, I have an implementation that should be portable.
https://github.com/python/cpython/compare/master...markshannon:pep-overflow-...
Regarding changes to behaviour, I don't see how "generic" Python code would behave differently on different platforms, except for cases where it already does.
In some cases, the PEP would have improved the situation.
For example: sys.setrecursionlimit(5000) def f(): f()
Currently, it raises a RecursionError on linux, but crashes the interpreter on Windows. With PEP 651 it would have raised a RecursionError on both platforms.
Am I missing something here?
So your example shows a user already comfortable in raising their recursion limit to work around needing more stack space to reach completion. What is stopping the user from continuing to raise the limit until they still reach their memory limit even with PEP 651? If you're worried about runaway recursion you will very likely hit that with the default stack depth already, so I personally don't see how a decoupled stack counter from the C stack specifically makes it any easier/better to detect runaway recursion. And if I need more recursion than the default, you're going to bump the recursion depth anyway, which weakens the protection in either the C or decoupled counter scenarios. Sure, it's currently platform-specific, but plenty of people want to push that limit based on their machine anyway and don't need consistency on platforms they will never run on, i.e. I don't see a huge benefit to being able to say that an algorithm consistently won't go past 5000 calls on all platforms compared to what the C stack protection already gives us (not to say there's zero benefit, but it isn't massive or widespread either IMO). I personally just don't see many people saying, "I really want to limit my program to an exact call stack depth of 5000 on all platforms which is beyond the default, but anything under is fine and anything over -- regardless of what the system can actually handle -- is unacceptable". Tack on the amount of changes required to give a cross-platform stack count and limit check compared to the benefit being proposed, and to me that pushes what the PEP is proposing into net-negative payoff.
On Fri, Mar 5, 2021 at 11:11 AM Brett Cannon <brett@python.org> wrote:
Speaking for myself ...
Ditto ... On Fri, Mar 5, 2021 at 7:04 AM Mark Shannon <mark@hotpy.org> wrote:
[...]
In some cases, the PEP would have improved the situation.
For example: sys.setrecursionlimit(5000) def f(): f()
Currently, it raises a RecursionError on linux, but crashes the interpreter on Windows. With PEP 651 it would have raised a RecursionError on both platforms.
Am I missing something here?
So your example shows a user already comfortable in raising their recursion limit to work around needing more stack space to reach completion. What is stopping the user from continuing to raise the limit until they still reach their memory limit even with PEP 651? If you're worried about runaway recursion you will very likely hit that with the default stack depth already, so I personally don't see how a decoupled stack counter from the C stack specifically makes it any easier/better to detect runaway recursion. And if I need more recursion than the default, you're going to bump the recursion depth anyway, which weakens the protection in either the C or decoupled counter scenarios. Sure, it's currently platform-specific, but plenty of people want to push that limit based on their machine anyway and don't need consistency on platforms they will never run on, i.e. I don't see a huge benefit to being able to say that an algorithm consistently won't go past 5000 calls on all platforms compared to what the C stack protection already gives us (not to say there's zero benefit, but it isn't massive or widespread either IMO). I personally just don't see many people saying, "I really want to limit my program to an exact call stack depth of 5000 on all platforms which is beyond the default, but anything under is fine and anything over -- regardless of what the system can actually handle -- is unacceptable".
Tack on the amount of changes required to give a cross-platform stack count and limit check compared to the benefit being proposed, and to me that pushes what the PEP is proposing into net-negative payoff.
To me, the point of that example is as a reminder that currently fiddling with the recursion limit can cause segfaults. Mark's PEP proposes two, somewhat independent, changes: (1) don't consume C stack on pure Python-to-Python (pp2p) function calls; (2) implement fool-proof C stack overflow checks. Change (2) makes it safe for users to mess with the stack overflow limit however they see fit. Despite (1), the limit for pp2p calls remains at 1000 so that users who unintentionally write some naively recursive code don't have to wait until they fill up all of memory before they get a traceback. (Of course they could also write a while-True loop that keeps adding an item to a list and they'd end up in the same situation. But in my experience that situation is less painful to deal with than accidental stack overflow, and I'd shudder at the thought of a traceback of a million lines.) Given that we have (1), why is (2) still needed? Because there are ways to recursively call Python code that aren't pp2p calls. By a pp2p (pure Python-to-Python) call, I mean any direct call, e.g. a method call or a function call. But what about other operations that can invoke Python code? E.g. if we have a dict d and a class C, we could create an instance of C and use it to index d, e.g. d[C()]. This operation is not a p2pp call -- the BINARY_SUBSCR opcode calls the dict's `__getitem__` method, and that calls the key's `__hash__` method. Here's a silly demonstration: ``` def f(c): d = {} return d[c] class C: def __hash__(self): return f(self) f(C()) ``` Note that the "traceback compression" implemented for simple recursive calls fails here -- I just ran this code and got 2000 lines of output. The way I imagine Mark wants to implement pp2p calls means that in this case each recursion step *does* add several other C stack frames, and this would be caught by the limit implemented in (2). I see no easy way around this -- after all the C code involved in the recursion could be a piece of 3rd party C code that itself is not at fault. So we could choose to implement only (2), robust C stack overflow checks. This would require a bunch of platform-specific code, and there might be platforms where we don't know how to implement this (I vaguely recall a UNIX version where main() received a growable stack but each thread only had a fixed 64kb stack), but those would be no worse off than today. Or we could choose to implement only (1), eliminating C stack usage for pp2p calls. But in that case we'd still need a recursion limit for non-pp2p calls. We could eliminate the recursion limit entirely for pp2p calls, but then we'd probably end up with user complaints -- I could imagine a framework that executes arbitrary user code and attempts recovery after receiving an exception (for example a REPL). Such a framework can never be 100% foolproof (e.g. `while True: l.append(1)`, or `os._exit(0)`), but accidental recursion would be a reasonable thing to attempt to curtail, so we'd still need to implement a limit on pp2p calls. Since the new pp2p implementation allows for a much larger recursion limit, Mark's design with two separate limits and exceptions makes sense -- "recursion" to limit pp2p call recursion, and "stack overflow" to limit the C stack overflow. To make things more reliable, it makes sense to throw in (2), better checks for C stack overflow, but that could be done independently -- we could just implement the two separate limits and exceptions and C-stack-frame-free pp2p calls without (2), and we'd be no worse off than today with regard to the safety of "wild" recursion like my example above. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
-cc: python-steering-council On Fri, Mar 5, 2021 at 4:26 PM Guido van Rossum <guido@python.org> wrote:
On Fri, Mar 5, 2021 at 11:11 AM Brett Cannon <brett@python.org> wrote:
Speaking for myself ...
Ditto ...
On Fri, Mar 5, 2021 at 7:04 AM Mark Shannon <mark@hotpy.org> wrote:
[...]
In some cases, the PEP would have improved the situation.
For example: sys.setrecursionlimit(5000) def f(): f()
Currently, it raises a RecursionError on linux, but crashes the interpreter on Windows. With PEP 651 it would have raised a RecursionError on both platforms.
Am I missing something here?
So your example shows a user already comfortable in raising their recursion limit to work around needing more stack space to reach completion. What is stopping the user from continuing to raise the limit until they still reach their memory limit even with PEP 651? If you're worried about runaway recursion you will very likely hit that with the default stack depth already, so I personally don't see how a decoupled stack counter from the C stack specifically makes it any easier/better to detect runaway recursion. And if I need more recursion than the default, you're going to bump the recursion depth anyway, which weakens the protection in either the C or decoupled counter scenarios. Sure, it's currently platform-specific, but plenty of people want to push that limit based on their machine anyway and don't need consistency on platforms they will never run on, i.e. I don't see a huge benefit to being able to say that an algorithm consistently won't go past 5000 calls on all platforms compared to what the C stack protection already gives us (not to say there's zero benefit, but it isn't massive or widespread either IMO). I personally just don't see many people saying, "I really want to limit my program to an exact call stack depth of 5000 on all platforms which is beyond the default, but anything under is fine and anything over -- regardless of what the system can actually handle -- is unacceptable".
Tack on the amount of changes required to give a cross-platform stack count and limit check compared to the benefit being proposed, and to me that pushes what the PEP is proposing into net-negative payoff.
To me, the point of that example is as a reminder that currently fiddling with the recursion limit can cause segfaults.
Mark's PEP proposes two, somewhat independent, changes: (1) don't consume C stack on pure Python-to-Python (pp2p) function calls; (2) implement fool-proof C stack overflow checks.
Change (2) makes it safe for users to mess with the stack overflow limit however they see fit. Despite (1), the limit for pp2p calls remains at 1000 so that users who unintentionally write some naively recursive code don't have to wait until they fill up all of memory before they get a traceback. (Of course they could also write a while-True loop that keeps adding an item to a list and they'd end up in the same situation. But in my experience that situation is less painful to deal with than accidental stack overflow, and I'd shudder at the thought of a traceback of a million lines.)
Given that we have (1), why is (2) still needed? Because there are ways to recursively call Python code that aren't pp2p calls. By a pp2p (pure Python-to-Python) call, I mean any direct call, e.g. a method call or a function call. But what about other operations that can invoke Python code? E.g. if we have a dict d and a class C, we could create an instance of C and use it to index d, e.g. d[C()]. This operation is not a p2pp call -- the BINARY_SUBSCR opcode calls the dict's `__getitem__` method, and that calls the key's `__hash__` method. Here's a silly demonstration: ``` def f(c): d = {} return d[c]
class C: def __hash__(self): return f(self)
f(C()) ``` Note that the "traceback compression" implemented for simple recursive calls fails here -- I just ran this code and got 2000 lines of output.
The way I imagine Mark wants to implement pp2p calls means that in this case each recursion step *does* add several other C stack frames, and this would be caught by the limit implemented in (2). I see no easy way around this -- after all the C code involved in the recursion could be a piece of 3rd party C code that itself is not at fault.
So we could choose to implement only (2), robust C stack overflow checks. This would require a bunch of platform-specific code, and there might be platforms where we don't know how to implement this (I vaguely recall a UNIX version where main() received a growable stack but each thread only had a fixed 64kb stack), but those would be no worse off than today.
Or we could choose to implement only (1), eliminating C stack usage for pp2p calls. But in that case we'd still need a recursion limit for non-pp2p calls. We could eliminate the recursion limit entirely for pp2p calls, but then we'd probably end up with user complaints -- I could imagine a framework that executes arbitrary user code and attempts recovery after receiving an exception (for example a REPL). Such a framework can never be 100% foolproof (e.g. `while True: l.append(1)`, or `os._exit(0)`), but accidental recursion would be a reasonable thing to attempt to curtail, so we'd still need to implement a limit on pp2p calls. Since the new pp2p implementation allows for a much larger recursion limit, Mark's design with two separate limits and exceptions makes sense -- "recursion" to limit pp2p call recursion, and "stack overflow" to limit the C stack overflow. To make things more reliable, it makes sense to throw in (2), better checks for C stack overflow, but that could be done independently -- we could just implement the two separate limits and exceptions and C-stack-frame-free pp2p calls without (2), and we'd be no worse off than today with regard to the safety of "wild" recursion like my example above.
*(Resurrecting a 9 month old thread a bit as C stack overflows came up in the context of https://github.com/python/cpython/pull/30855 <https://github.com/python/cpython/pull/30855>)* Regarding (2)... robust C stack overflow checks are hard - if even possible. At most you can get an improvement but never be able to guarantee it. What I've seen proposed for that in this thread I believe won't deliver in some real scenarios. Any code can spawn threads from extension modules or embedding code with whatever C stack size it wants. Those threads can (read: do) call back into the CPython interpreter. Any assumption of a constant minimum C stack size is incorrect unless you go with the absolute platform minimum possible value (too small). 512k is considered huge. One way other language runtimes deal with stack overflows is via a sigaltstack'd SIGSEGV handler that introspects what caused the signal (!) to see if it was maybe a stack overflow and if so fixes things up (hello Golang dynamic stack implementation) to recover before resuming execution. This is difficult and potentially not even possible to guarantee correct when you don't control the entire runtime as Golang does. There has been some interesting talk about techniques recently in The Orange Site's https://news.ycombinator.com/item?id=28682945 thread which has cropped up since this discussion. The idea around allocating our own C stack and swapping over to that is "neat" - but it'd be a challenge as we're a re-entrant runtime so we'd effectively need to grow a list of newly allocated C stacks for use upon every entry where we find one of our thread's stacks is not the one in use. I suspect such an implementation would probably be more complex than the feature is worth - but I'll leave judgment on that up to after anyone actually manages to implement a cross platform version of such an idea. -gps
On 31/01/2022 5:23 am, Gregory P. Smith wrote:
-cc: python-steering-council
On Fri, Mar 5, 2021 at 4:26 PM Guido van Rossum <guido@python.org <mailto:guido@python.org>> wrote:
On Fri, Mar 5, 2021 at 11:11 AM Brett Cannon <brett@python.org <mailto:brett@python.org>> wrote:
Speaking for myself ...
Ditto ...
On Fri, Mar 5, 2021 at 7:04 AM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote: [...]
In some cases, the PEP would have improved the situation.
For example: sys.setrecursionlimit(5000) def f(): f()
Currently, it raises a RecursionError on linux, but crashes the interpreter on Windows. With PEP 651 it would have raised a RecursionError on both platforms.
Am I missing something here?
So your example shows a user already comfortable in raising their recursion limit to work around needing more stack space to reach completion. What is stopping the user from continuing to raise the limit until they still reach their memory limit even with PEP 651? If you're worried about runaway recursion you will very likely hit that with the default stack depth already, so I personally don't see how a decoupled stack counter from the C stack specifically makes it any easier/better to detect runaway recursion. And if I need more recursion than the default, you're going to bump the recursion depth anyway, which weakens the protection in either the C or decoupled counter scenarios. Sure, it's currently platform-specific, but plenty of people want to push that limit based on their machine anyway and don't need consistency on platforms they will never run on, i.e. I don't see a huge benefit to being able to say that an algorithm consistently won't go past 5000 calls on all platforms compared to what the C stack protection already gives us (not to say there's zero benefit, but it isn't massive or widespread either IMO). I personally just don't see many people saying, "I really want to limit my program to an exact call stack depth of 5000 on all platforms which is beyond the default, but anything under is fine and anything over -- regardless of what the system can actually handle -- is unacceptable".
Tack on the amount of changes required to give a cross-platform stack count and limit check compared to the benefit being proposed, and to me that pushes what the PEP is proposing into net-negative payoff.
To me, the point of that example is as a reminder that currently fiddling with the recursion limit can cause segfaults.
Mark's PEP proposes two, somewhat independent, changes: (1) don't consume C stack on pure Python-to-Python (pp2p) function calls; (2) implement fool-proof C stack overflow checks.
Change (2) makes it safe for users to mess with the stack overflow limit however they see fit. Despite (1), the limit for pp2p calls remains at 1000 so that users who unintentionally write some naively recursive code don't have to wait until they fill up all of memory before they get a traceback. (Of course they could also write a while-True loop that keeps adding an item to a list and they'd end up in the same situation. But in my experience that situation is less painful to deal with than accidental stack overflow, and I'd shudder at the thought of a traceback of a million lines.)
Given that we have (1), why is (2) still needed? Because there are ways to recursively call Python code that aren't pp2p calls. By a pp2p (pure Python-to-Python) call, I mean any direct call, e.g. a method call or a function call. But what about other operations that can invoke Python code? E.g. if we have a dict d and a class C, we could create an instance of C and use it to index d, e.g. d[C()]. This operation is not a p2pp call -- the BINARY_SUBSCR opcode calls the dict's `__getitem__` method, and that calls the key's `__hash__` method. Here's a silly demonstration: ``` def f(c): d = {} return d[c]
class C: def __hash__(self): return f(self)
f(C()) ``` Note that the "traceback compression" implemented for simple recursive calls fails here -- I just ran this code and got 2000 lines of output.
The way I imagine Mark wants to implement pp2p calls means that in this case each recursion step *does* add several other C stack frames, and this would be caught by the limit implemented in (2). I see no easy way around this -- after all the C code involved in the recursion could be a piece of 3rd party C code that itself is not at fault.
So we could choose to implement only (2), robust C stack overflow checks. This would require a bunch of platform-specific code, and there might be platforms where we don't know how to implement this (I vaguely recall a UNIX version where main() received a growable stack but each thread only had a fixed 64kb stack), but those would be no worse off than today.
Or we could choose to implement only (1), eliminating C stack usage for pp2p calls. But in that case we'd still need a recursion limit for non-pp2p calls. We could eliminate the recursion limit entirely for pp2p calls, but then we'd probably end up with user complaints -- I could imagine a framework that executes arbitrary user code and attempts recovery after receiving an exception (for example a REPL). Such a framework can never be 100% foolproof (e.g. `while True: l.append(1)`, or `os._exit(0)`), but accidental recursion would be a reasonable thing to attempt to curtail, so we'd still need to implement a limit on pp2p calls. Since the new pp2p implementation allows for a much larger recursion limit, Mark's design with two separate limits and exceptions makes sense -- "recursion" to limit pp2p call recursion, and "stack overflow" to limit the C stack overflow. To make things more reliable, it makes sense to throw in (2), better checks for C stack overflow, but that could be done independently -- we could just implement the two separate limits and exceptions and C-stack-frame-free pp2p calls without (2), and we'd be no worse off than today with regard to the safety of "wild" recursion like my example above.
/(Resurrecting a 9 month old thread a bit as C stack overflows came up in the context of https://github.com/python/cpython/pull/30855 <https://github.com/python/cpython/pull/30855>)/
Regarding (2)... robust C stack overflow checks are hard - if even possible. At most you can get an improvement but never be able to guarantee it. What I've seen proposed for that in this thread I believe won't deliver in some real scenarios. Any code can spawn threads from extension modules or embedding code with whatever C stack size it wants. Those threads can (read: do) call back into the CPython interpreter. Any assumption of a constant minimum C stack size is incorrect unless you go with the absolute platform minimum possible value (too small). 512k is considered huge.
One way other language runtimes deal with stack overflows is via a sigaltstack'd SIGSEGV handler that introspects what caused the signal (!) to see if it was maybe a stack overflow and if so fixes things up (hello Golang dynamic stack implementation) to recover before resuming execution. This is difficult and potentially not even possible to guarantee correct when you don't control the entire runtime as Golang does.
There has been some interesting talk about techniques recently in The Orange Site's https://news.ycombinator.com/item?id=28682945 <https://news.ycombinator.com/item?id=28682945> thread which has cropped up since this discussion. The idea around allocating our own C stack and swapping over to that is "neat" - but it'd be a challenge as we're a re-entrant runtime so we'd effectively need to grow a list of newly allocated C stacks for use upon every entry where we find one of our thread's stacks is not the one in use. I suspect such an implementation would probably be more complex than the feature is worth - but I'll leave judgment on that up to after anyone actually manages to implement a cross platform version of such an idea.
-gps
Robust stack overflow checks are hard in the most general case, yes. But on Windows, or any posix system, we can ask the O/S what the stack bounds are. It's a lot simpler than many other features we try to support portably. Cheers, Mark.
participants (6)
-
Antoine Pitrou
-
Brett Cannon
-
Gregory P. Smith
-
Guido van Rossum
-
Mark Shannon
-
Python Steering Council