Re: [Python-Dev] stack check on Unix: any suggestions?

Charles> I get the exact same value. Of course the amount of other Charles> stuff running makes no differemce, you get the core dump Charles> because you've hit the RLIMIT for stack usage, not because Charles> you've exhausted memory. Amount of RAM in the machine, or swap Charles> space in use has nothing to do with it. Do "ulimit -s Charles> unlimited" and see what happens... Makes no difference: % ./python Python 2.0b1 (#81, Aug 31 2000, 15:53:42) [GCC 2.95.3 19991030 (prerelease)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam Copyright 1995-2000 Corporation for National Research Initiatives (CNRI) >>> % ulimit -a core file size (blocks) 0 data seg size (kbytes) unlimited file size (blocks) unlimited max locked memory (kbytes) unlimited max memory size (kbytes) unlimited open files 1024 pipe size (512 bytes) 8 stack size (kbytes) unlimited cpu time (seconds) unlimited max user processes 2048 virtual memory (kbytes) unlimited % ./python Misc/find_recursionlimit.py ... Limit of 2300 is fine recurse add repr init getattr getitem Limit of 2400 is fine recurse add repr Segmentation fault Skip

>> % ulimit -a >> stack size (kbytes) unlimited >> % ./python Misc/find_recursionlimit.py >> ... >> Limit of 2400 is fine >> repr >> Segmentation fault Charles> This means that you're not hitting the rlimit at all but Charles> getting a real segfault! Time to do setrlimit -c unlimited and Charles> break out GDB, I'd say. Running the program under gdb does no good. It segfaults and winds up with a corrupt stack as far as the debugger is concerned. For some reason bash won't let me set a core file size != 0 either: % ulimit -c 0 % ulimit -c unlimited % ulimit -c 0 though I doubt letting the program dump core would be any better debugging-wise than just running the interpreter under gdb's control. Kinda weird. Skip

On Fri, Sep 01, 2000 at 11:09:02AM -0500, Charles G Waldman wrote:
Skip Montanaro writes:
Makes no difference:
Yes, which I did (well, my girlfriend was hogging the PC with 'net connection, and there was nothing but silly soft-porn on TV, so I spent an hour or two on my laptop ;) and I did figure out the problem isn't stackspace (which was already obvious) but *damned* if I know what the problem is. Here's an easy way to step through the whole procedure, though. Take a recursive script, like the one Guido posted: i = 0 class C: def __getattr__(self, name): global i print i i += 1 return self.name # common beginners' mistake Run it once, so you get a ballpark figure on when it'll crash, and then branch right before it would crash, calling some obscure function (os.getpid() works nicely, very simple function.) This was about 2926 or so on my laptop (adding the branch changed this number, oddly enough.) import os i = 0 class C: def __getattr__(self, name): global i print i i += 1 if (i > 2625): os.getpid() return self.name # common beginners' mistake (I also moved the 'print i' to inside the branch, saved me a bit of scrollin') Then start GDB on the python binary, set a breakpoint on posix_getpid, and "run 'test.py'". You'll end up pretty close to where the interpreter decides to go bellyup. Setting a breakpoint on ceval.c line 612 (the "opcode = NEXTOP();' line) or so at that point helps doing a per-bytecode check, though this made me miss the actual point of failure, and I don't fancy doing it again just yet :P What I did see, however, was that the reason for the crash isn't the pure recursion. It looks like the recursiveness *does* get caught properly, and the interpreter raises an error. And then prints that error over and over again, probably once for every call to getattr(), and eventually *that* crashes (but why, I don't know. In one test I did, it crashed in int_print, the print function for int objects, which did 'fprintf(fp, "%ld", v->ival);'. The actual SEGV arrived inside fprintf's internals. v->ival was a valid integer (though a high one) and the problem was not derefrencing 'v'. 'fp' was stderr, according to its _fileno member. 'ltrace' (if you have it) is also a nice tool to let loose on this kind of script, by the way, though it does make the test take a lot longer, and you really need enough diskspace to store the output ;-P Back-to-augassign-docs-ly y'rs, -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

I said:
Thomas Wouters came back with:
I've got some more info: this crash only happens if you have built with --enable-threads. This brings in a different (thread-safe) version of fprintf, which uses mutex locks on file objects so output from different threads doesn't get scrambled together. And the SEGV that I saw was happening exactly where fprintf is trying to unlock the mutex on stderr, so it can print "Maximum recursion depth exceeded". This looks like more ammo for Guido's theory that there's something wrong with libpthread on linux, and right now I'm elbows-deep in the guts of libpthread trying to find out more. Fun little project for a Saturday night ;-)
Sure, I've got ltrace, and also more diskspace than you really want to know about! Working-at-a-place-with-lots-of-machines-can-be-fun-ly yr's, -Charles

On Sat, Sep 02, 2000 at 07:52:33PM -0500, Charles G Waldman wrote:
I concur that it's probably not Python-related, even if it's probably Python-triggered (and possibly Python-induced, because of some setting or other) -- but I think it would be very nice to work around it! And we have roughly the same recursion limit for BSDI with a 2Mbyte stack limit, so lets not adjust that guestimate just yet. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

Thomas> In one test I did, it crashed in int_print, the print function Thomas> for int objects, which did 'fprintf(fp, "%ld", v->ival);'. The Thomas> actual SEGV arrived inside fprintf's internals. v->ival was a Thomas> valid integer (though a high one) and the problem was not Thomas> derefrencing 'v'. 'fp' was stderr, according to its _fileno Thomas> member. I get something similar. The script conks out after 4491 calls (this with a threaded interpreter). It segfaults in _IO_vfprintf trying to print 4492 to stdout. All arguments to _IO_vfprintf appear valid (though I'm not quite sure how to print the third, va_list, argument). When I configure --without-threads, the script runs much longer, making it past 18068. It conks out in the same spot, however, trying to print 18069. The fact that it occurs in the same place with and without threads (the addresses of the two different _IO_vfprintf functions are different, which implies different stdio libraries are active in the threading and non-threading versions as Thomas said), suggests to me that the problem may simply be that in the threading case each thread (even the main thread) is limited to a much smaller stack. Perhaps I'm seeing what I'm supposed to see. If the two versions were to crap out for different reasons, I doubt I'd see them failing in the same place. Skip

Skip Montanaro writes:
Yes, libpthread defines it's own version of _IO_vprintf. Try this experiment: do a "ulimit -a" to see what the stack size limit is; start your Python process; find it's PID, and before you start your test, go into another window and run the command watch -n 0 "grep Stk /proc/<pythonpid>/status" This will show exactly how much stack Python is using. Then start the runaway-recursion test. If it craps out when the stack usage hits the rlimit, you are seeing what you are supposed to see. If it craps out anytime sooner, there is a real bug of some sort, as I'm 99% sure there is.

>> % ulimit -a >> stack size (kbytes) unlimited >> % ./python Misc/find_recursionlimit.py >> ... >> Limit of 2400 is fine >> repr >> Segmentation fault Charles> This means that you're not hitting the rlimit at all but Charles> getting a real segfault! Time to do setrlimit -c unlimited and Charles> break out GDB, I'd say. Running the program under gdb does no good. It segfaults and winds up with a corrupt stack as far as the debugger is concerned. For some reason bash won't let me set a core file size != 0 either: % ulimit -c 0 % ulimit -c unlimited % ulimit -c 0 though I doubt letting the program dump core would be any better debugging-wise than just running the interpreter under gdb's control. Kinda weird. Skip

On Fri, Sep 01, 2000 at 11:09:02AM -0500, Charles G Waldman wrote:
Skip Montanaro writes:
Makes no difference:
Yes, which I did (well, my girlfriend was hogging the PC with 'net connection, and there was nothing but silly soft-porn on TV, so I spent an hour or two on my laptop ;) and I did figure out the problem isn't stackspace (which was already obvious) but *damned* if I know what the problem is. Here's an easy way to step through the whole procedure, though. Take a recursive script, like the one Guido posted: i = 0 class C: def __getattr__(self, name): global i print i i += 1 return self.name # common beginners' mistake Run it once, so you get a ballpark figure on when it'll crash, and then branch right before it would crash, calling some obscure function (os.getpid() works nicely, very simple function.) This was about 2926 or so on my laptop (adding the branch changed this number, oddly enough.) import os i = 0 class C: def __getattr__(self, name): global i print i i += 1 if (i > 2625): os.getpid() return self.name # common beginners' mistake (I also moved the 'print i' to inside the branch, saved me a bit of scrollin') Then start GDB on the python binary, set a breakpoint on posix_getpid, and "run 'test.py'". You'll end up pretty close to where the interpreter decides to go bellyup. Setting a breakpoint on ceval.c line 612 (the "opcode = NEXTOP();' line) or so at that point helps doing a per-bytecode check, though this made me miss the actual point of failure, and I don't fancy doing it again just yet :P What I did see, however, was that the reason for the crash isn't the pure recursion. It looks like the recursiveness *does* get caught properly, and the interpreter raises an error. And then prints that error over and over again, probably once for every call to getattr(), and eventually *that* crashes (but why, I don't know. In one test I did, it crashed in int_print, the print function for int objects, which did 'fprintf(fp, "%ld", v->ival);'. The actual SEGV arrived inside fprintf's internals. v->ival was a valid integer (though a high one) and the problem was not derefrencing 'v'. 'fp' was stderr, according to its _fileno member. 'ltrace' (if you have it) is also a nice tool to let loose on this kind of script, by the way, though it does make the test take a lot longer, and you really need enough diskspace to store the output ;-P Back-to-augassign-docs-ly y'rs, -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

I said:
Thomas Wouters came back with:
I've got some more info: this crash only happens if you have built with --enable-threads. This brings in a different (thread-safe) version of fprintf, which uses mutex locks on file objects so output from different threads doesn't get scrambled together. And the SEGV that I saw was happening exactly where fprintf is trying to unlock the mutex on stderr, so it can print "Maximum recursion depth exceeded". This looks like more ammo for Guido's theory that there's something wrong with libpthread on linux, and right now I'm elbows-deep in the guts of libpthread trying to find out more. Fun little project for a Saturday night ;-)
Sure, I've got ltrace, and also more diskspace than you really want to know about! Working-at-a-place-with-lots-of-machines-can-be-fun-ly yr's, -Charles

On Sat, Sep 02, 2000 at 07:52:33PM -0500, Charles G Waldman wrote:
I concur that it's probably not Python-related, even if it's probably Python-triggered (and possibly Python-induced, because of some setting or other) -- but I think it would be very nice to work around it! And we have roughly the same recursion limit for BSDI with a 2Mbyte stack limit, so lets not adjust that guestimate just yet. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

Thomas> In one test I did, it crashed in int_print, the print function Thomas> for int objects, which did 'fprintf(fp, "%ld", v->ival);'. The Thomas> actual SEGV arrived inside fprintf's internals. v->ival was a Thomas> valid integer (though a high one) and the problem was not Thomas> derefrencing 'v'. 'fp' was stderr, according to its _fileno Thomas> member. I get something similar. The script conks out after 4491 calls (this with a threaded interpreter). It segfaults in _IO_vfprintf trying to print 4492 to stdout. All arguments to _IO_vfprintf appear valid (though I'm not quite sure how to print the third, va_list, argument). When I configure --without-threads, the script runs much longer, making it past 18068. It conks out in the same spot, however, trying to print 18069. The fact that it occurs in the same place with and without threads (the addresses of the two different _IO_vfprintf functions are different, which implies different stdio libraries are active in the threading and non-threading versions as Thomas said), suggests to me that the problem may simply be that in the threading case each thread (even the main thread) is limited to a much smaller stack. Perhaps I'm seeing what I'm supposed to see. If the two versions were to crap out for different reasons, I doubt I'd see them failing in the same place. Skip

Skip Montanaro writes:
Yes, libpthread defines it's own version of _IO_vprintf. Try this experiment: do a "ulimit -a" to see what the stack size limit is; start your Python process; find it's PID, and before you start your test, go into another window and run the command watch -n 0 "grep Stk /proc/<pythonpid>/status" This will show exactly how much stack Python is using. Then start the runaway-recursion test. If it craps out when the stack usage hits the rlimit, you are seeing what you are supposed to see. If it craps out anytime sooner, there is a real bug of some sort, as I'm 99% sure there is.
participants (4)
-
Charles G Waldman
-
Charles G Waldman
-
Skip Montanaro
-
Thomas Wouters