[Python-Dev] Is core dump always a bug? Advice requested

Christian Tismer tismer at stackless.com
Thu May 13 08:58:30 EDT 2004


Michael Hudson wrote:

...

> I don't know what we can do about this.  Armin suggested another hack:
> stick the address of a stack variable in Py_Main() in a global and
> compare the address of another stack variable when you want to see how
> much stack you have left.  Obvious problems include knowing what's
> safe and which direction the stack is growing in... Even more scarily,
> what SBCL (a Common Lisp implementation) does is mprotect() a VM page
> and the end of the stack and deal with overflow in a SIGSEGV handler.
> It's hard to see what else could be really safe (have I mentioned that
> I hate C recently?).
> 
> Option 3, I guess, is integrate stackless :-)

Just as a note: Even Stackless can get into deep recursions
and then uses a hack similar to Armin's suggestion.
As a side effect, this made it quite simple to convince
cPickle to pickle very deeply nested structures.

On verification:

I think I'm all against writing a bytecode verifier,
because everything needed is already there. Hartmut Goebel
has written a nice Python decompyler, based upon John
Aycock's spark and prior work.
It produces output that often looks better than the original
source. And they verify the decompiled bytecode by compiling
it again.
The drawback is that this appears to be no longer an
open source project, see http://www.crazy-compilers.com/decompyle/
Maybe we should talk to Hartmut...

On sending code over the network:

Bytecode verification is fine, but you don't want to execute
foreign bytecode, even if it is valid. This is like executing
any binary program, which might do anything, being Python or not.
Since people are going to send programs over the network,
we will need some way to exchange compiled code in a trusted
manner, and I believe this take more considerations which is
a matter for the crypto people.

Anyway, here is

My Proposal (TM)
================

Let's assume that you don't trust any bytecode that has not yet
been verified for your machine.
Further, you generate a private key for your machine, or yourself.

For .pyc archives which you create yourself, your private key
is used to initialize a sha digest, and then the .pyc is run
through the digest, and the digest added to the .pyc.
When loading any .pyc, your private key is used again to
compute the digest, and the result verified against the
digest which is in the pyc (at its end I guess).

For code objects which come over a network, I suggest a similar
check, given that you have gotten the other side's key
and you can use it to verify foreign code. This works in
trusted networks, only. In public networks you would
need public/private key pairs... much harder.

Now, any bytecode that is not yet verified can be run through
decompyle just once, and the result is compiled again, signed
with your key, and stored.
This is more than requested, since it doesn't allow for any
bytecode sequence, but just something that has an existing
possible source code. Well, actually I think we want this,
because bytecode is an optimization artifact.

Summarizing, I think to add SHA keys to bytecode and .pyc
files, at least as an option, makes some sense and is fast
to check on every external access to bytecode.
Verification of healthy code on the VM level could be made
easy with decompyle.

cheers - chris
-- 
Christian Tismer             :^)   <mailto:tismer at stackless.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/




More information about the Python-Dev mailing list