
At 00:21 2003-01-28 +0100, Christian Tismer wrote:
Armin Rigo wrote:
Hello Holger, On Mon, Jan 27, 2003 at 08:03:40PM +0100, Samuele Pedroni wrote:
how do you intend to use any of the existing C-libraries, then? Rely on CPython to provide the binding?
The point was whether you want your builtin types "abstractions" to be directly ctypes based.
Yes, sorry. I was thinking about that, i.e. how we internally represent the built-in types. Being able to call external C code is another matter.
I think that progressing in the ctypes direction can happen in parallel with Python-Core pythonifications. Beeing able to make C-library calls (like File-IO) without specialized intermediate C-code does seem like an important feature.
Yes, although I have another potential objection here. It might not be a problem to have specialized intermediate C code if this code is generated, just like -- after all it's the goal -- most of the rest of the final C code of the interpreter. Then what we need is a way to *describe* calls to a C function rather than a way to actually *do* the calls. So ctypes is a good
Warning: preliminary thoughts, not implemented ;-) This focuses here on calling into C, but it is really a generic abstract approach, if you look at it that way ;-) meta_eval(i386_call_thunk_representation_as_bitstruct) where meta_eval expects a single structured Bits instance as an argument, and expects that to have info for extracting a type header first of all, and various other structure depending on that. I.e., it needs structure info as well as data bits, meaning the whole arg is really an arg list in packed form, composed of two args: a type name, and the data itself. For meta_eval use, I think a single name signifying the type of data would be enough. Thus the encoding viewed as a Python string could be '\x02\x04i386\x??dddddddd' to signify a list of two bitstrings in byte- chunked format, the first having 4 chars and the second however many required for the machine code. Another way to spell that data as bits might be (sketching in the air here, don't take too literally, despite 386 code ;-) codebits = Bits(bytes=(0,8); argbits = Bits() # make space to pack C args representation retbits = Bits() # make space for C return value representation # ... prepare args by stuffing representations into argbits slices # ... prepare return space as zeroes or whatever thunk=I386Code() thunk.append(I386Inst('push ebp')) thunk.append(I386Inst('mov ebp, esp')) thunk.append(I386Inst('mov eax, DWORD PTR %s' % argbits.addr())) thunk.append(I386Inst('push eax')) thunk.append(I386Inst('call %s' % some_env.getaddr('_foo'))) thunk.append(I386Inst('add esp, 4')) thunk.append(I386Inst('mov DWORD PTR %s, eax' % retbits.addr())) thunk.append(I386Inst('xor eax, eax')) thunk.append(I386Inst('pop ebp')) thunk.append(I386Inst('ret 0')) map(bits.bytes.append, [2, 4, 'i386', len(thunk), thunk.to_string()]) # the thunk is a complete subroutine that meta_eval's i386 interpreter # can call without arguments. meta_eval(bits) # ... (pick up results in retbits) I.e., meta_eval is a generalization of Python eval, which only knows how to eval Python strings and code objects. meta_eval here must get a single bits (instance of Bits) argument encoded with type name and following binary data (e.g., probably located at id(bits)+8) for CPython). Once seeing the type name, meta_eval can dispatch to i386 or make a system call or run an interpreter, but that's a matter of having appropriate callables registered for the type name giving the type of that bits image being passed. Note that it could be a relocatable object file or info for loading a dll and finding somthing therein, etc. The bits as above were actually a packed argument-by-value list with an arg count at the beginning, and expected length-prefixed string as the first arg, and length-prefixed bytes as the second arg (assuming lengths >=128 would get two or four byte length codes, but this is a representation detail). Note that this all seems very hardware-oriented, but it can be viewed as abstract bit sequences and structures, and it just happens that there is (some) hardware that matches some things nicely. There are details about injecting bytes into the current process virtual memory as instructions and data etc, but this work can be encapsulated by the 'i386'-code "evaluator" used by meta_eval for 'i386' type bits data. The solutions must already be part of psyco, IWT. They must also be part of low level debuggers. These things would be very platform-specific, but hopefully meta_eval could stay mostly generic. At the lowest level, IWT meta_eval would often be optimized away. I.e., if at the byte code level, there were a META_EVAL instruction, you can see the abstraction expressed at that level, etc. I think something like this could be used to express other foreign function and data interfaces too. Presumably the pattern would be to define an ordinary python function or method, and inside it use Bits and meta_eval etc. to build special interfaces. meta_eval('DOS_BIOS', bits_for_int16_representation) or meta_eval('LINUX_SYS_CALL', ...) might also call on special "evaluators" but meta_eval itself would stay generic. IOW, you could also generate linux system call ints or PC BIOS ints, since we're talking about arranging bits with Bits methods and arranging to have those bits seen by the CPU by way of meta_apply. Of course you could also get executable bytes by reading the output of special tools. I'm just trying to get concepts worked out re expressing creation of low level stuff and then getting it used by low level mechanisms, but all expressed in the higher level. meta_eval really ought to be viewed in abstract terms, though we are concentrating on CPU and machine language instructions and raw memory here. IOW, "meta_apply('byte_code_interp', bytecode_representation)" might be very close to the functionality of "eval(byte_code_representation)." Whereas "meta_eval('gcc', c_source)" also has the abstract idea of applying some function to interpret some input. What's the difference from just having a bunch of functions to do these things? Well, it seeks to unify the principle and identify an abstraction that can reasonably be implemented at each level in some form, whether it is a META_EVAL CPython byte code or in-lined machine code calling other machine code. I confess to a bit of handwaving, but if I hold back, there's no chance of making any contribution to early discussions. You can always ignore it if it doesn't fit into your views ;-)
way to express such a description, but it is not necessary to rely on libffi-style machine-code hackery to actually perform the calls; all we need to do is statically emit the specialized C code into the produced interpreter and obtain something close to the original CPython source.
Hmm! It seems that I could have saved my last longer post. We agree that we need to describe primitive types. It is much less urgent to actually implement them.
I was going to write a long answer to that, but this got kind of long, so maybe this will do. Just pick out anything useful and ignore the rest ;-) Cheers, Bengt

Hello Bengt, On Tue, Jan 28, 2003 at 12:15:20AM -0800, Bengt Richter wrote:
meta_eval(i386_call_thunk_representation_as_bitstruct)
The abstract idea is great, but why is it so low-level? Generalizing the various forms of procedure invocations is something that I would certainly like to do (although it is maybe too soon right now). But why does the representation have to be bits? Even Python's code objects are more than a string of bytes. What would be useful in my opinion is the definition of a Python interface that can be implemented by various notions of "code objects", including CPython's code objects, CPython's built-in function objects, and anything else from Psyco- or GCC-produced machine code to Prolog rules compiled by PyLog. A bientôt, Armin.
participants (2)
-
Armin Rigo
-
Bengt Richter