[Python-Dev] Is core dump always a bug? Advice requested

Michel Pelletier michel at dialnetwork.com
Thu May 13 13:07:55 EDT 2004


On Thursday 13 May 2004 04:43, python-dev-request at python.org wrote:
> From: "Phillip J. Eby" <pje at telecommunity.com>

> As I think Tim has already mentioned, all you do is track the stack depth
> for each instruction, and ensure it's consistent.  If there is an
> inconsistency in the predicted stack level for an instruction, as compared
> with any of the instructions that jump to it (no matter where they jump
> from), you have a problem.
>
> Or, to look at it another way:

<snip code example>

Thanks Phillip, this code is a great starting point.  I was trying to 
reproduce the algorithm given in the JVM Spec section 4.9, but it is 
seriously complicated by strict type issue which have no meaning here.

Also, as Armin pointed out, there are a couple instances of instructions that 
would be difficult to analize without some changes to their design 
(END_FINALLY and MAKE_CLOSURE).  But he suggests relatively simple solutions 
to both.

I am also working on a short list of static and structural constraints for 
bytecode, this is what I have so far (the ones i'm unsure about are 
questions):

Static Constraints on Bytecode Instructions

  1. The bytecode string must not be empty (len(co_code) > 0).

  2. Is there maximum co_code length?

  3. The first instruction in the bytecode string begins at index 0.

  4. Only valid bytecodes with correct number of operands can be in
     the bytecode string.

  5. The index of the next instruction equals the index of the current
     instruction plus the length of the current instruction and all
     its operands.

  6. The last byte of the last instruction must be len(co_code) - 1.

  7. The last instruction must be RETURN_VALUE?

  8. Exception handlers verification?  Maybe the compiler could generate a             
table of handlers, their handled types, and their instruction ranges for 
verification.  The run-time benefit would be that an exception handler could 
be quickly dispatched instead of falling through each except block in 
sequence looking for the matching handler as it looks to work now (not that 
it's a big performance bottleneck...)

Static Constraints on Bytecode Instruction Operands

  1. The target of a jump instruction must be within the code
     boundaries and must not fall between an instruction and its
     operands.

  2. The operand of a LOAD_CONST instruction must be an valid index
     into the co_consts.

  3. Can a similar bounds check be made at verfication time for
     LOAD_FAST?

  4. Can index into co_names, co_varnames, co_freevars, co_cellvars be
     verfied?

Structural Constraints between Bytecode Instructions

  1. Each instruction must only be executed with the appropriate
     number of arguments in the operand stack, regardless of the
     execution path that leads to its invocation.

  2. If an instruction can be executed along several different
     execution paths, the value stack must have the same depth prior
     to the execution of the instruction, regardless of the path
     taken.

  3. At no point during execution can the value stack grow to a
     depth greater than that implied by co_stacksize.
 
> This algorithm should readily handle all forward and backward jumps, while
> ensuring that no stack growth or shrinkage can take place, because every
> instruction must have a single known stack depth, no matter how many
> forward or backward jumps it takes to get there.
>
> Notice that the algorithm will "follow" every branch, but will not retrace
> the code at the target location once the stack depth is seen to be
> consistent.  It also tackles underflow and "falling off the end" of the
> bytecode, while simultaneously computing the maximum stack depth
> used.  Each instruction is fetched and processed for its depth effects and
> other characteristics exactly once, so the timing is O(n).  And, it has a
> nice place to put opcode-specific operand validation for opcodes that use
> locals, freevars, names, or constants.

Yea!  As Tim points out below there are some subtleties, but this is great.

> Anyway, I think we can safely halt discussion of the halting problem
> now.  :)  And, since I've now reduced this from an interesting theoretical
> idea to talk about, into something resembling work (God forbid), anybody
> who wants to actually *implement* this crazy idea is now free to do so,
> without even having to read the paper Tim linked to.  :)

Well I did read it, and I suggest anyone else also read chapter 4 of the Java 
VM Spec.  

-Michel





More information about the Python-Dev mailing list