[Python-Dev] Is core dump always a bug? Advice requested

Wed May 12 23:30:24 EDT 2004

[Michel Pelletier]
> ...
> Would there be an interest in at least a PEP to consider a bytecode
> verifier?

I think so, yes.

> I'd be willing to take it on, but my experience is limited to JVM bytecode
> and verification.

Python should be easier, in large part because you have to give up sooner.
For example, Python has a single catch-all BINARY_ADD opcode for infix "+".
There aren't distinct opcodes for "adding ints" and "adding floats" and
"adding strings".  So, in Python, there's no need to even try to check the
types of BINARY_ADD's stack operands.  They're always pointer-to-PyObject,
and since pointer-to-PyObject is the only kind of thing that *can* be pushed
on the PVM stack, there's nothing for a modestly ambitious static verifier
to look for there.

Checks for type correctness in the PVM are done at runtime instead.  I think
it's fair to say that a bytecode verifier is overwhelmingly "just an
optimization":  if bytecode properties can be verified from static code
analysis, runtime code isn't needed to verify them dynamically, or, in the
absence of such runtime checks, static analysis plugs ways to provoke
segfaults.

Examples of things that aren't checked at all now in the PVM were given
before, like the PVM doesn't check that it allocated enough stack space, or
check that C-level indexing into the co_consts vector is in bounds.  Well,
in the latter case, that is checked (at runtime, on every co_consts access)
in a debug build, but not in a release build.