
I had a little argument with the marshal module on Windows last night, I eventually won. :-) A patch was checked in which would prevent blowing out the stack and segfaulting with this code: marshal.loads( 'c' + ('X' * 4*4) + '{' * 2**20) Originally, I didn't change the recursion limit which was 5000. (See MAX_MARSHAL_STACK_DEPTH in Python/marshal.c) This is a constant in C code that cannot be changed at runtime. The fix worked on most platforms. However it didn't on some (Windows and MIPS?), presumably due a smaller stack limit. I don't know what the stack limits are on each architecture. I dropped the limit to 4000, that still crashed. I eventually settled on 2000. Which passed in 2.6. I don't think there is a test case for the recursion limit when dumping a deeply nested object. I suppose I should add one, because that could also blow the limit too. The point of this message is to see if anyone thinks 2000 is unreasonable. It could probably be raised, but I'm not going to try it since I don't have access to a Windows box. Testing this remotely sucks. n

If this turns out ever to be a limitation, I would challenge the reporter to rewrite marshal in a non-recursive manner. It shouldn't be that difficult: w_object should operate a queue of objects yet to be written, and Tuple, Dict, Set (why does marshal support writing sets, anyway?) would all just add things to the queue rather than recursively invoking w_object. The only challenge would be code objects, which have a w_long interspersed with the w_object calls. I would fix this by changing the marshal format of code objects, to move the co_firstlineno to the beginning. For reading, a heap-managed stack would be necessary, consisting of a (type, value, position, next) linked list. Again, code objects would need special casing, to allow construction with NULL pointers at first, followed by indexed setting of the code fields. For lists and tuples, position would define the next index to be filled; for dictionaries, it would not be needed, and for code objects, it would index the various elements of the code object. Regards, Martin

If this turns out ever to be a limitation, I would challenge the reporter to rewrite marshal in a non-recursive manner. It shouldn't be that difficult: w_object should operate a queue of objects yet to be written, and Tuple, Dict, Set (why does marshal support writing sets, anyway?) would all just add things to the queue rather than recursively invoking w_object. The only challenge would be code objects, which have a w_long interspersed with the w_object calls. I would fix this by changing the marshal format of code objects, to move the co_firstlineno to the beginning. For reading, a heap-managed stack would be necessary, consisting of a (type, value, position, next) linked list. Again, code objects would need special casing, to allow construction with NULL pointers at first, followed by indexed setting of the code fields. For lists and tuples, position would define the next index to be filled; for dictionaries, it would not be needed, and for code objects, it would index the various elements of the code object. Regards, Martin
participants (2)
-
"Martin v. Löwis"
-
Neal Norwitz