[pypy-dev] Builtin types

Bengt Richter bokr at oz.net
Tue Jan 28 09:15:20 CET 2003


At 00:21 2003-01-28 +0100, Christian Tismer wrote:
>Armin Rigo wrote:
>>Hello Holger,
>>On Mon, Jan 27, 2003 at 08:03:40PM +0100, Samuele Pedroni wrote:
>>
>>>>how do you intend to use any of the existing C-libraries, then?
>>>>Rely on CPython to provide the binding?
>>>
>>>The point was whether you want your builtin types "abstractions" to be directly
>>>ctypes based.
>>
>>Yes, sorry.  I was thinking about that, i.e. how we internally represent the
>>built-in types.  Being able to call external C code is another matter.
>>
>>>>I think that progressing in the ctypes direction can happen in
>>>>parallel with Python-Core pythonifications.  Beeing able to make
>>>>C-library calls (like File-IO) without specialized intermediate C-code
>>>>does seem like an important feature.
>>
>>Yes, although I have another potential objection here.  It might not be a
>>problem to have specialized intermediate C code if this code is generated,
>>just like -- after all it's the goal -- most of the rest of the final C code
>>of the interpreter.  Then what we need is a way to *describe* calls to a C
>>function rather than a way to actually *do* the calls.  So ctypes is a good

Warning: preliminary thoughts, not implemented ;-)
This focuses here on calling into C, but it is really a generic
abstract approach, if you look at it that way ;-)

    meta_eval(i386_call_thunk_representation_as_bitstruct)

where meta_eval expects a single structured Bits instance as an
argument, and expects that to have info for extracting a type header
first of all, and various other structure depending on that. I.e.,
it needs structure info as well as data bits, meaning the whole
arg is really an arg list in packed form, composed of two args:
a type name, and the data itself.

For meta_eval use, I think a single name signifying the type of data
would be enough. Thus the encoding viewed as a Python string could be '\x02\x04i386\x??dddddddd' to signify a list of two bitstrings in byte-
chunked format, the first having 4 chars and the second however many
required for the machine code. Another way to spell that data as bits
might be (sketching in the air here, don't take too literally, despite
386 code ;-)

    codebits = Bits(bytes=(0,8);
    argbits = Bits() # make space to pack C args representation
    retbits = Bits() # make space for C return value representation
    # ... prepare args by stuffing representations into argbits slices
    # ... prepare return space as zeroes or whatever
    thunk=I386Code()
    thunk.append(I386Inst('push ebp'))
    thunk.append(I386Inst('mov  ebp, esp'))
    thunk.append(I386Inst('mov  eax, DWORD PTR %s' % argbits.addr()))
    thunk.append(I386Inst('push eax'))
    thunk.append(I386Inst('call %s' % some_env.getaddr('_foo')))
    thunk.append(I386Inst('add  esp, 4'))
    thunk.append(I386Inst('mov  DWORD PTR %s, eax' % retbits.addr()))
    thunk.append(I386Inst('xor  eax, eax'))
    thunk.append(I386Inst('pop  ebp'))
    thunk.append(I386Inst('ret  0'))

    map(bits.bytes.append, [2, 4, 'i386', len(thunk), thunk.to_string()])
    # the thunk is a complete subroutine that meta_eval's i386 interpreter
    # can call without arguments.
    meta_eval(bits)
    # ... (pick up results in retbits)

I.e., meta_eval is a generalization of Python eval, which only
knows how to eval Python strings and code objects. meta_eval here
must get a single bits (instance of Bits) argument encoded with
type name and following binary data (e.g., probably located at
id(bits)+8) for CPython).

Once seeing the type name, meta_eval can dispatch to i386 or make
a system call or run an interpreter, but that's a matter of having
appropriate callables registered for the type name giving the type
of that bits image being passed. Note that it could be a relocatable
object file or info for loading a dll and finding somthing therein,
etc. The bits as above were actually a packed argument-by-value list
with an arg count at the beginning, and expected length-prefixed
string as the first arg, and length-prefixed bytes
as the second arg (assuming lengths >=128 would get two or four byte
length codes, but this is a representation detail).

Note that this all seems very hardware-oriented, but it can be viewed
as abstract bit sequences and structures, and it just happens that
there is (some) hardware that matches some things nicely.

There are details about injecting bytes into the current process
virtual memory as instructions and data etc, but this work can be
encapsulated by the 'i386'-code "evaluator" used by meta_eval for
'i386' type bits data. The solutions must already be part of psyco, IWT.
They must also be part of low level debuggers. These things would
be very platform-specific, but hopefully meta_eval could stay mostly
generic. At the lowest level, IWT meta_eval would often be optimized away.

I.e., if at the byte code level, there were a META_EVAL instruction,
you can see the abstraction expressed at that level, etc.

I think something like this could be used to express other foreign
function and data interfaces too. Presumably the pattern would be
to define an ordinary python function or method, and inside it use Bits
and meta_eval etc. to build special interfaces.

meta_eval('DOS_BIOS', bits_for_int16_representation) or
meta_eval('LINUX_SYS_CALL', ...) might also call on special
"evaluators" but meta_eval itself would stay generic.
IOW, you could also generate linux system call ints or PC BIOS ints,
since we're talking about arranging bits with Bits methods and arranging
to have those bits seen by the CPU by way of meta_apply.

Of course you could also get executable bytes by reading the output of
special tools. I'm just trying to get concepts worked out re expressing
creation of low level stuff and then getting it used by low level
mechanisms, but all expressed in the higher level.

meta_eval really ought to be viewed in abstract terms, though we are
concentrating on CPU and machine language instructions and raw memory
here. IOW, "meta_apply('byte_code_interp', bytecode_representation)" might be very close to the functionality of "eval(byte_code_representation)."

Whereas "meta_eval('gcc', c_source)" also has the abstract idea of
applying some function to interpret some input. What's the difference
from just having a bunch of functions to do these things? Well, it seeks
to unify the principle and identify an abstraction that can reasonably
be implemented at each level in some form, whether it is a META_EVAL
CPython byte code or in-lined machine code calling other machine code.
I confess to a bit of handwaving, but if I hold back, there's no chance
of making any contribution to early discussions. You can always ignore
it if it doesn't fit into your views ;-)

>>way to express such a description, but it is not necessary to rely on
>>libffi-style machine-code hackery to actually perform the calls; all we need
>>to do is statically emit the specialized C code into the produced interpreter
>>and obtain something close to the original CPython source.
>
>Hmm!
>It seems that I could have saved my last longer post.
>We agree that we need to describe primitive types.
>It is much less urgent to actually implement them.

I was going to write a long answer to that, but this got
kind of long, so maybe this will do. Just pick out anything
useful and ignore the rest ;-)

Cheers,
Bengt



More information about the Pypy-dev mailing list