Mailman 3 __builtin__ module - pypy-dev

newer
(ad)WebRobot, Mail ID Extractor,...

builtin module

Scott Fenton

Jan. 22, 2003

2:05 a.m.

Hello all. Due to the fact that there's no way in hell for me to get out of the US by March, and due to the fact that I love this concept, I've hacked up some basic replacements for various functions in __builtin__. The code resides at http://fenton.baltimore.md.us/pypy.py Take a look at it and tell me what you think. Currenty, this code implements everything BUT: * the exceptions (seperate concept) * builtin types (not sure how to handle them) * callable (not sure how to test) * classmethod (ditto) * coerce (tritto) * compile (needs an actual compiler, and the AST module scares me) * dir (not sure how to get current scope) * eval (not my job(TM)) * execfile (ditto(TM)) * hex, oct (lazyness, it'll be in version 2) * id (lower level than I can handle) * intern (didn't understand the docstring) * isinstance, issubclass (see classmethod) * globals, locals (not sure how to get ahold of them) * raw_input (really complex) * staticmethod (ugly hackery) * super (see classmethod) * type (need more info, dammit!) * unichr (given chr, unichr frightens me) * xrange (sorta builtin type) -Scott Fenton -- char m[9999],*n[99],*r=m,*p=m+5000,**s=n,d,c;main(){for(read(0,r,4000);c=*r; r++)c-']'||(d>1||(r=*p?*s:(--s,r)),!d||d--),c-'['||d++||(*++s=r),d||(*p+=c== '+',*p-=c=='-',p+=c=='>',p-=c=='<',c-'.'||write(1,p,1),c-','||read(2,p,1));}

Attachments:

attachment.sig (application/pgp-signature — 189 bytes)

Show replies by date

holger krekel

January 2003

3:02 a.m.

[Scott Fenton Tue, Jan 21, 2003 at 09:05:14PM -0500]

...

Hello all. Due to the fact that there's no way in hell for me to get out of the US by March, and due to the fact that I love this concept, I've hacked up some basic replacements for various functions in __builtin__. The code resides at http://fenton.baltimore.md.us/pypy.py Take a look at it and tell me what you think.

Cool! A pity you can't take part. But latest at the next EuroPython we should do another sprint. I really want to get a repository going soon. You code should be checked in there. I'd love to have people porting various bits of C-coded stuff into pure Python. Though i prefer coding myself i would take some time to organize such an effort. Help with organizing this C to Python effort would be much appreciated. Think of testing (sic!), nightly builds, SCons etc.pp. I'd prefer to do this on a subversion repository which we (at codespeak) hopefully set up pretty soon. I guess i am not the only one eager to try this out. If it there are big obstacles we resort to good old cvs. Sorry for (ab)using your contribution for this interwoven half-announcement. regards, holger

Nathan Heagy

5:14 p.m.

...

I really want to get a repository going soon.

Let's get this started! I'm ready to write some code as well if someone could point out a good place to start. -- Nathan Heagy phone:306.653.4747 fax:306.653.4774 http://www.zu.com

Thomas Heller

8:08 a.m.

Scott Fenton <scott@fenton.baltimore.md.us> writes:

...

Hello all. Due to the fact that there's no way in hell for me to get out of the US by March, and due to the fact that I love this concept, I've hacked up some basic replacements for various functions in __builtin__. The code resides at http://fenton.baltimore.md.us/pypy.py Take a look at it and tell me what you think.

Your implementations of ord() and chr() are somewhat inefficient, because they rebuild the list/dictionary each time. Pass them to dis.dis() and you'll see what I mean. Otherwise - cool.

...

Currenty, this code implements everything BUT:

* the exceptions (seperate concept) * builtin types (not sure how to handle them) * callable (not sure how to test) * classmethod (ditto) * coerce (tritto) * compile (needs an actual compiler, and the AST module scares me) * dir (not sure how to get current scope) * eval (not my job(TM)) * execfile (ditto(TM)) * hex, oct (lazyness, it'll be in version 2) * id (lower level than I can handle) * intern (didn't understand the docstring) * isinstance, issubclass (see classmethod) * globals, locals (not sure how to get ahold of them) * raw_input (really complex) * staticmethod (ugly hackery) * super (see classmethod) * type (need more info, dammit!) * unichr (given chr, unichr frightens me) * xrange (sorta builtin type)

Hm, it should be possible to find suitable tests in the lib/test subdir, IMO. Thomas

Christian Tismer

2:42 a.m.

Thomas Heller wrote:

...

Scott Fenton <scott@fenton.baltimore.md.us> writes:

...
Hello all. Due to the fact that there's no way in hell for me to get out of the US by March, and due to the fact that I love this concept, I've hacked up some basic replacements for various functions in __builtin__. The code resides at http://fenton.baltimore.md.us/pypy.py Take a look at it and tell me what you think.

Your implementations of ord() and chr() are somewhat inefficient, because they rebuild the list/dictionary each time. Pass them to dis.dis() and you'll see what I mean.

This is correct for pure Python. One would have computed the tables once and either used them as a global, or as a default value for a dummy function parameter. On the other hand, with the assumption that Psyco or its successor will be able to deduce constant local expressions and turn them into constants, this approach is absolutely fine; despite the fact that these tables will most probably not be used and replaced by their trivial implementation. I anyway do appreciate the effort very much: Trying to reduce stuff based upon a minimum! cheers - chris

Armin Rigo

4:22 p.m.

Hello Scott, On Tue, Jan 21, 2003 at 09:05:14PM -0500, Scott Fenton wrote:

...

I've hacked up some basic replacements for various functions in __builtin__.

Fine! The greatest benefits of your code is that it clearly shows what can be directly implemented in Python, what could be, and what cannot. A function like chr() is in the second category: it could be written in Python as you did, but it is not "how we feel a chr() implementation should be". It should just build a one-string character, putting 'i' somewhere as an ASCII code. But how? This problem, and the deeper problems with some other functions, come from the fact that you placed your code at the same level as the Python interpreter. The functions you wrote could be used in the current CPython interpreter, in place of the existing built-ins. In PyPython, these are functions that we will populate the emulated built-ins with. We still need two levels: the functions that operate at the same level as the interpreted programs (like yours), and the functions that operate at the level of the interpreter (like CPython's built-in functions). In other words, we still need the notion of built-in functions that will be as "magic" as CPython's in the sense that they do something that couldn't be done by user code. You see what I mean when you try to rewrite the type() builtin :-) Armin PS: just a comment about abs(), cmp() and len(). These should use the underlying __abs__(), __cmp__() and __len__() methods, as you have done for apply(), bool(), hash(), pow() and repr().

Thomas Heller

4:57 p.m.

Armin Rigo <arigo@tunes.org> writes:

...

Fine! The greatest benefits of your code is that it clearly shows what can be directly implemented in Python, what could be, and what cannot.

A function like chr() is in the second category: it could be written in Python as you did, but it is not "how we feel a chr() implementation should be". It should just build a one-string character, putting 'i' somewhere as an ASCII code. But how?

I don't think so. For me, this is a fine implementation of chr(): def chr(i): return "\x00\x01\x02x03...\xFF"[i] Maybe a check should be added to make sure 'i' is between 0 and 255 :-) But the resposibility to construct string objects is not chr()'s burdon, IMO. In a CPython extension and probably also in the core there are helper functions to build these strings, the implementor of chr() would use them. Thomas

Boyd Roberts

5:06 p.m.

Thomas Heller wrote:

...

I don't think so. For me, this is a fine implementation of chr():

def chr(i): return "\x00\x01\x02x03...\xFF"[i]

That's dreadful. def chr(i): return'%c' % i

Thomas Heller

5:15 p.m.

Boyd Roberts <boyd@strakt.com> writes:

...

Thomas Heller wrote:

...
I don't think so. For me, this is a fine implementation of chr():

def chr(i): return "\x00\x01\x02x03...\xFF"[i]

That's dreadful.

def chr(i): return'%c' % i

Maybe, but you probably got my point... Thomas

Bengt Richter

11:21 p.m.

At 18:15 2003-01-22 +0100, Thomas Heller wrote:

...

Boyd Roberts <boyd@strakt.com> writes:

...
Thomas Heller wrote:

...
I don't think so. For me, this is a fine implementation of chr():

def chr(i): return "\x00\x01\x02x03...\xFF"[i]

That's dreadful.

def chr(i): return'%c' % i

Maybe, but you probably got my point...

I see four issues in the above that might be worth some guiding words from the leaders[a]: 1) premature optimization, 2)appropriate level of Python code for coding PyPython, 3) appropriate definition semantics, 4) dependencies in definitions. 1) Standard advice, presumably 2) What kinds of constructs will be best for Psyco to deal with? When coding the definition of something, should one ask: 2a) Is the thing being coded part of core minimal PyPython, so coding should be limited to a Python subset, or 2b) Is the thing being coded at the next level, so that the full language can be presumed to be available? 2c) How does one know whether a thing belongs to 2a or 2b, or is this distinction really necessary in the current thinking? 3) When is it good to define primitively, like Thomas's definition, and when is it good to define in terms of a composition implicitly delegating to existing function(s), like Boyd's? And should one be careful as to whether the functions one is depending on belong to 2a or 2b? 4) Should a dependency tree be documented as we go, e.g., to avoid hidden circular dependencies in delegated functionality, but also to make clear levels of primitiveness? (BTW, ISTM this would help in any future attempt to factor out the implementation of a primitive core functionality). The two definitions of chr() above are examples of primitive vs composite/delegating definitions, which is what brought this to mind (which is not to say that the "primitive" definition doesn't depend on anything, but it's composing with lower level primitives). Regards, Bengt [a] Should I have cc'd Armin, Chris, and Holger? I.e., would that have been courtesy or redundant annoyance?

Armin Rigo

9:41 a.m.

Hello Bengt, On def chr(i): return "\x00\x01\x02x03...\xFF"[i] versus def chr(i): return'%c' % i I feel the second solution to be more to the point. The first one seems redundant somehow: "pick the ith character from this string whose ith character just happen to have ASCII code i". But that's probably a minor point. It shows that chr() may well be implemented in pure Python using lower "primitives" (let's call them built-in functions or methods). It may indeed be a good idea to draw a fluctuating but documented dependency graph between pure Python and built-in functions. It would let two teams work on these two halves of the work: (1) writing pure Python functions (as Scott did), and (2) writing built-in functions. I think that Scott's work drew the line for the built-in functions. Most of what he couldn't code must be done as built-in functions. Armin.

Boyd Roberts

10:09 a.m.

Armin Rigo wrote:

...

def chr(i): return "\x00\x01\x02x03...\xFF"[i]

The problem with this is that it's absurdly verbose as well as error prone to enumerate the string constant. It may be faster, but it's no good if it's _wrong_.

Thomas Heller

10:18 a.m.

Armin Rigo <arigo@tunes.org> writes:

...

I think that Scott's work drew the line for the built-in functions. Most of what he couldn't code must be done as built-in functions.

Except that classmethod, staticmethod, and super could be done in pure Python. Aren't those even in Guido's descrintro?? Thomas

Scott Fenton

8:32 p.m.

On Thu, Jan 23, 2003 at 11:18:00AM +0100, Thomas Heller wrote:

...

Armin Rigo <arigo@tunes.org> writes:

...
I think that Scott's work drew the line for the built-in functions. Most of what he couldn't code must be done as built-in functions.

Except that classmethod, staticmethod, and super could be done in pure Python. Aren't those even in Guido's descrintro??

They may well be. My point in writing some of __builtin__ was to get some inital code out there, not to draw a line in the sand. The next version will probably have some more stuff included in it. Remember, this was a one afternoon hack that got the inital version out. -Scott -- char m[9999],*n[99],*r=m,*p=m+5000,**s=n,d,c;main(){for(read(0,r,4000);c=*r; r++)c-']'||(d>1||(r=*p?*s:(--s,r)),!d||d--),c-'['||d++||(*++s=r),d||(*p+=c== '+',*p-=c=='-',p+=c=='>',p-=c=='<',c-'.'||write(1,p,1),c-','||read(2,p,1));}

Armin Rigo

9:39 a.m.

Hello Scott, On Thu, Jan 23, 2003 at 03:32:06PM -0500, Scott Fenton wrote:

...

...
Except that classmethod, staticmethod, and super could be done in pure Python. Aren't those even in Guido's descrintro??

They may well be. My point in writing some of __builtin__ was to get some inital code out there, not to draw a line in the sand. The next version will probably have some more stuff included in it.

Sure. I was rather talking about the comments you added in the e-mail, where you say that some functions don't seem to be implementable. About chr(i) vs. '%c'%i vs. '\x00\x01...\xFF'[i]: what I feel this shows is that one of these solutions must be thought as the primitive way to build a character, and the others should use it; and I definitely feel that chr() is the primitive way to build a character. Contrary to what I said in a previous e-mail I don't think that chr() should be implemented with '%c'%i. On the other hand, I guess that the strings' % operator could nicely be implemented in pure Python. It would then have to use chr() to implement the %c format code. It looks more reasonable than the other way around. A bientôt, Armin.

Thomas Heller

11:02 a.m.

Armin Rigo <arigo@tunes.org> writes:

...

About chr(i) vs. '%c'%i vs. '\x00\x01...\xFF'[i]: what I feel this shows is that one of these solutions must be thought as the primitive way to build a character, and the others should use it; and I definitely feel that chr() is the primitive way to build a character. Contrary to what I said in a previous e-mail I don't think that chr() should be implemented with '%c'%i. On the other hand, I guess that the strings' % operator could nicely be implemented in pure Python. It would then have to use chr() to implement the %c format code. It looks more reasonable than the other way around.

I don't want to beat this to death, but I've a different opinion. There is no 'character' data type in Python, only a 'string', which consists of 0 to n characters. This is also reflected in Python/bltinmodule.c, where the code is: s[0] = (char)x; return PyString_FromStringAndSize(s, 1); So even the C code creates a 'character array', and passes it to PyString_FromStringAndSize, which I would call to canonical way to build a string of whatever size. Of course there's a lot going on what this function does behind the scenes... Thomas

Scott Fenton

12:43 p.m.

On Fri, Jan 24, 2003 at 10:39:35AM +0100, Armin Rigo wrote:

...

About chr(i) vs. '%c'%i vs. '\x00\x01...\xFF'[i]: what I feel this shows is that one of these solutions must be thought as the primitive way to build a character, and the others should use it; and I definitely feel that chr() is the primitive way to build a character. Contrary to what I said in a previous e-mail I don't think that chr() should be implemented with '%c'%i. On the other hand, I guess that the strings' % operator could nicely be implemented in pure Python. It would then have to use chr() to implement the %c format code. It looks more reasonable than the other way around.

I disagree. My feeling about this is probably that everything that can be expressed as a function should be in pure python, and that which can't should probably be C (or Java, or a Python compiler, or....). I guess since builtin types fit below that level, we should probably make '%c'%i the builtin way of conversion, and, in fact, the new version of pypy.py I'm putting together has it that way. Another good, compelling argument is that it would be idiotic to try to implement unichr the other way, so for consistency we should probably delegate the task of character->number to C, where it can be done more gracefully anyway. -Scott -- char m[9999],*n[99],*r=m,*p=m+5000,**s=n,d,c;main(){for(read(0,r,4000);c=*r; r++)c-']'||(d>1||(r=*p?*s:(--s,r)),!d||d--),c-'['||d++||(*++s=r),d||(*p+=c== '+',*p-=c=='-',p+=c=='>',p-=c=='<',c-'.'||write(1,p,1),c-','||read(2,p,1));}

holger krekel

1:48 p.m.

[Scott Fenton Fri, Jan 24, 2003 at 07:43:13AM -0500]

...

On Fri, Jan 24, 2003 at 10:39:35AM +0100, Armin Rigo wrote:

...
About chr(i) vs. '%c'%i vs. '\x00\x01...\xFF'[i]: what I feel this shows is that one of these solutions must be thought as the primitive way to build a character, and the others should use it; and I definitely feel that chr() is the primitive way to build a character. Contrary to what I said in a previous e-mail I don't think that chr() should be implemented with '%c'%i. On the other hand, I guess that the strings' % operator could nicely be implemented in pure Python. It would then have to use chr() to implement the %c format code. It looks more reasonable than the other way around.

I disagree. My feeling about this is probably that everything that can be expressed as a function should be in pure python,

what do you mean by this? "everything" can always be expressed in a python function. It's a matter of time and space so could you be more specific? Anyway, I would try very hard to express all the builtins in python. And i see Thomas Heller's ctypes approach as a way to make this possible. Before coding anything in C there must be a real *need* to do so. greetings, holger

Scott Fenton

1:32 p.m.

On Fri, Jan 24, 2003 at 02:48:42PM +0100, holger krekel wrote:

...

what do you mean by this? "everything" can always be expressed in a python function. It's a matter of time and space so could you be more specific?

Mostly, stuff that exists as "builtin" syntax, ie printf-style formats that can be implemented "under the hood" using sprintf should probably be C. "everything" can be expressed in term of my signature, I just wouldn't try it if my life depened on it.

...

Anyway, I would try very hard to express all the builtins in python. And i see Thomas Heller's ctypes approach as a way to make this possible. Before coding anything in C there must be a real *need* to do so.

I can make chr python either way. It's a question of how "deep" we want python to go. -Scott -- char m[9999],*n[99],*r=m,*p=m+5000,**s=n,d,c;main(){for(read(0,r,4000);c=*r; r++)c-']'||(d>1||(r=*p?*s:(--s,r)),!d||d--),c-'['||d++||(*++s=r),d||(*p+=c== '+',*p-=c=='-',p+=c=='>',p-=c=='<',c-'.'||write(1,p,1),c-','||read(2,p,1));}

Bengt Richter

11:28 p.m.

At 08:32 2003-01-24 -0500, Scott Fenton wrote:

...

On Fri, Jan 24, 2003 at 02:48:42PM +0100, holger krekel wrote:

...
what do you mean by this? "everything" can always be expressed in a python function. It's a matter of time and space so could you be more specific?

Mostly, stuff that exists as "builtin" syntax, ie printf-style formats that can be implemented "under the hood" using sprintf should probably be C. "everything" can be expressed in term of my signature, I just wouldn't try it if my life depened on it.

...
Anyway, I would try very hard to express all the builtins in python. And i see Thomas Heller's ctypes approach as a way to make this possible. Before coding anything in C there must be a real *need* to do so.

I can make chr python either way. It's a question of how "deep" we want python to go.

I'm thinking "depth" is a monotonic relationship among nodes along a path in an acyclic graph, and so far we are talking about two kinds of nodes: "interpreter level" and "Python level". I am getting an idea that maybe we should be thinking meta-levels instead of two kinds, and that in general there can concurrently exist many levels of nodes in the tree, all busily acting as "interpreter level" for their higher level parents, except for root and leaves. Not sure how useful this is for immediate goals, but I'm struggling to form a satisfying abstract view of the whole problem ;-) It's interesting that manipulation of representational elements in one level implements operations at another level, and manipulation is _defined_ by a pattern of elements that live somewhere too. It's a subtle soup ;-) BTW, discussion of chr(i) and ord(c) brings up the question of representing reinterpret_cast at an abstract level. (Whether a C reinterpret_cast correctly implements chr and ord semantics is a separate question. I just want to talk about casting a moment, for the light it may shed on type info). ISTM we need a (python level) type to (meta-) represent untyped bits. I.e., the 8 bits of a char don't carry the type info that makes them char vs uint8. The hypothetical meta_repr function I mentioned in a previous post does make use of a Python level object (Bits instance) to represent untyped bits. I.e., if c is a character, conceptually: meta_repr(c) => ('str', id(c), Bits(c)) where Bits(c) is a special bit vector at the Python level, but represents the untyped bits of c at the next meta-level ("interpreter level" here). An in-place reinterpret_cast from char to uint8 would amount to ('str', id(c), Bits(c)) => ('int', id(c), Bits(c)) (Assuming that in the abstract, 'int' doesn't care about the current number of bits in its representation, though I'm skipping a detail about sign, which is an interpretation of the msb of Bits(c), so it should probably be written ('str', id(c), Bits(c)) => ('int', id(c), Bits(0)+Bits(c)) where the '+' means bit vectors concatenate. Or you could have a distinct 'uint' type. That might be necessary for unsigned fixed-width representations. A _conversion_ would virtually allocate existence space by also providing a new distinct id and copying the representational Bits (or not copying, if we can tag the instance as immutable for some uses). Note that you can imagine corresponding operations in a different level where there's an "actual" space allocation somewhere in some array (Python level or malloc level ;-) and id indicates a location in that array, and 'int' is perhaps encoded in some Bits associated with the Bits of the newly allocated int representation, or maybe implicit in the indication of the space array, if that is dedicated to a single type. All this, too, could be represented abstractly at a level before machine language, so I think a two-meta-level model may be constraining. If you look at ('str', id(c), Bits(c)) as Python-level code, it is a Python level tuple with a Python level string, int, and class instance. The whole thing only has meta-meaning because of how it is interpreted in a context relating two meta-levels. I.e., the interpretation itself is expressed at the Python level, but being plain Python, there is a meta_repr(('str', id(c), Bits(c))) involved in the next level of what's happening, and so forth, until leaf representations of machine code are evolved and used, IWT. Hoping I'm helping factor and clarify concepts rather than tangle and muddy, Bengt

Scott Fenton

11:46 p.m.

On Fri, Jan 24, 2003 at 03:28:16PM -0800, Bengt Richter wrote:

...

[snip] I'm thinking "depth" is a monotonic relationship among nodes along a path in an acyclic graph, and so far we are talking about two kinds of nodes: "interpreter level" and "Python level". I am getting an idea that maybe we should be thinking meta-levels instead of two kinds, and that in general there can concurrently exist many levels of nodes in the tree, all busily acting as "interpreter level" for their higher level parents, except for root and leaves. Not sure how useful this is for immediate goals, but I'm struggling to form a satisfying abstract view of the whole problem ;-) It's interesting that manipulation of representational elements in one level implements operations at another level, and manipulation is _defined_ by a pattern of elements that live somewhere too. It's a subtle soup ;-)

Hmm.... meta-circular interpreters. Sounds suspicously like problems Lisp has dealt with for years. Perhaps we should take a look at how systems like Maclisp and Interlisp handled these problems, since they were themselves written in Lisp, as I recall it. BTW, for reference I've put a copy of Steele and Sussman's "Art of the Interpreter" on my site at http://fenton.baltimore.md.us/AIM-453.pdf parenthetically yours, -Scott -- char m[9999],*n[99],*r=m,*p=m+5000,**s=n,d,c;main(){for(read(0,r,4000);c=*r; r++)c-']'||(d>1||(r=*p?*s:(--s,r)),!d||d--),c-'['||d++||(*++s=r),d||(*p+=c== '+',*p-=c=='-',p+=c=='>',p-=c=='<',c-'.'||write(1,p,1),c-','||read(2,p,1));}

Samuele Pedroni

12:01 a.m.

...

At 08:32 2003-01-24 -0500, Scott Fenton wrote:

...
On Fri, Jan 24, 2003 at 02:48:42PM +0100, holger krekel wrote:

...
what do you mean by this? "everything" can always be expressed in a python function. It's a matter of time and space so could you be more specific?

Mostly, stuff that exists as "builtin" syntax, ie printf-style formats that can be implemented "under the hood" using sprintf should probably be C. "everything" can be expressed in term of my signature, I just wouldn't try it if my life depened on it.

...
Anyway, I would try very hard to express all the builtins in python. And i see Thomas Heller's ctypes approach as a way to make this possible. Before coding anything in C there must be a real *need* to do so.

I can make chr python either way. It's a question of how "deep" we want python to go.

I'm thinking "depth" is a monotonic relationship among nodes along a path in an acyclic graph, and so far we are talking about two kinds of nodes: "interpreter level" and "Python level". I am getting an idea that maybe we should be thinking meta-levels instead of two kinds, and that in general there can concurrently exist many levels of nodes in the tree, all busily acting as "interpreter level" for their higher level parents, except for root and leaves. Not sure how useful this is for immediate goals, but I'm struggling to form a satisfying abstract view of the whole problem ;-) It's interesting that manipulation of representational elements in one level implements operations at another level, and manipulation is _defined_ by a pattern of elements that live somewhere too. It's a subtle soup ;-)

BTW, discussion of chr(i) and ord(c) brings up the question of representing reinterpret_cast at an abstract level. (Whether a C reinterpret_cast correctly implements chr and ord semantics is a separate question. I just want to talk about casting a moment, for the light it may shed on type info).

ISTM we need a (python level) type to (meta-) represent untyped bits. I.e.,

...

8 bits of a char don't carry the type info that makes them char vs uint8. The hypothetical meta_repr function I mentioned in a previous post does make use of a Python level object (Bits instance) to represent untyped bits. I.e., if c is a character, conceptually: meta_repr(c) => ('str', id(c), Bits(c)) where Bits(c) is a special bit vector at the Python level, but represents the untyped bits of c at the next meta-level ("interpreter level" here).

An in-place reinterpret_cast from char to uint8 would amount to ('str', id(c), Bits(c)) => ('int', id(c), Bits(c)) (Assuming that in the abstract, 'int' doesn't care about the current number of bits in its representation, though I'm skipping a detail about sign, which is an interpretation of the msb of Bits(c), so it should probably be written ('str', id(c), Bits(c)) => ('int', id(c), Bits(0)+Bits(c)) where the '+' means bit vectors concatenate. Or you could have a distinct 'uint' type. That might be necessary for unsigned fixed-width representations.

A _conversion_ would virtually allocate existence space by also providing a new distinct id and copying the representational Bits (or not copying, if we can tag the instance as immutable for some uses). Note that you can imagine corresponding operations in a different level where there's an "actual" space allocation somewhere in some array (Python level or malloc level ;-) and id indicates a location in that array, and 'int' is perhaps encoded in some Bits associated with the Bits of the newly allocated int representation, or maybe implicit in the indication of the space array, if that is dedicated to a single type. All this, too, could be represented abstractly at a level before machine language, so I think a two-meta-level model may be constraining.

If you look at ('str', id(c), Bits(c)) as Python-level code, it is a Python level tuple with a Python level string, int, and class instance. The whole thing only has meta-meaning because of how it is interpreted in a context relating two meta-levels.

I.e., the interpretation itself is expressed at the Python level, but being

From: "Bengt Richter" <bokr@oz.net> the plain

...

Python, there is a meta_repr(('str', id(c), Bits(c))) involved in the next level of what's happening, and so forth, until leaf representations of machine code are evolved and used, IWT.

Hoping I'm helping factor and clarify concepts rather than tangle and muddy, Bengt

I think that the untyped bit vector thing is too low level.

Scott Fenton

11:52 p.m.

On Sat, Jan 25, 2003 at 01:01:39AM +0100, Samuele Pedroni wrote:

...

[snip] I think that the untyped bit vector thing is too low level.

Not if we're talking about generating a compiler, it isn't. I think part of our problem is that, besides extending python FAR beyond its original problem domain, we look like we're heading towards the first python compiler, as opposed to the CPython and Jython interpreters (ok, there is freeze, but it uses the internals of an interpreter to run, so not really). What we need to do is figure out if we want a direct compiler in which to write an interpeter, or just an interpreter, which wouldn't ever be free standing. meta-circular-ly yours, -Scott -- char m[9999],*n[99],*r=m,*p=m+5000,**s=n,d,c;main(){for(read(0,r,4000);c=*r; r++)c-']'||(d>1||(r=*p?*s:(--s,r)),!d||d--),c-'['||d++||(*++s=r),d||(*p+=c== '+',*p-=c=='-',p+=c=='>',p-=c=='<',c-'.'||write(1,p,1),c-','||read(2,p,1));}

Samuele Pedroni

12:55 a.m.

...

Not if we're talking about generating a compiler, it isn't. I think part of our problem is that, besides extending python FAR beyond its original problem domain, we look like we're heading towards the first python compiler, as opposed to the CPython and Jython interpreters (ok, there is freeze, but it uses the internals of an interpreter to run, so not really). What we need to do is figure out if we want a direct compiler in which to write an interpeter, or just an interpreter, which wouldn't ever be free standing.

I guess that what you mean is that (limited) untyped bit vectors can be used in an intermediate representation near to machine code generation in a to-machine-code compiler. My point is that wrt capturing python semantics in python in a way that is statically analyzable in order to produce widely different "backends" the notion of untyped bits vector is likely too low level. (Even if the backend set should encompass "compilers").

Scott Fenton

12:34 a.m.

On Sat, Jan 25, 2003 at 01:55:04AM +0100, Samuele Pedroni wrote:

...

[snip] I guess that what you mean is that (limited) untyped bit vectors can be used in an intermediate representation near to machine code generation in a to-machine-code compiler.

My point is that wrt capturing python semantics in python in a way that is statically analyzable in order to produce widely different "backends" the notion of untyped bits vector is likely too low level. (Even if the backend set should encompass "compilers").

OK then, in that case, sure. Bit vectors do end up being too low-level. I think it would really help this project to have a clear statement of what we're aiming to do: ie., do we want a parser with multiple backends, two of which are interpet and compile, or do we want a simple python->machine code or python->execute as interpeted translator? -Scott -- char m[9999],*n[99],*r=m,*p=m+5000,**s=n,d,c;main(){for(read(0,r,4000);c=*r; r++)c-']'||(d>1||(r=*p?*s:(--s,r)),!d||d--),c-'['||d++||(*++s=r),d||(*p+=c== '+',*p-=c=='-',p+=c=='>',p-=c=='<',c-'.'||write(1,p,1),c-','||read(2,p,1));}

Christian Tismer

3:41 a.m.

Scott Fenton wrote:

...

On Sat, Jan 25, 2003 at 01:55:04AM +0100, Samuele Pedroni wrote:

...
[snip] I guess that what you mean is that (limited) untyped bit vectors can be used in an intermediate representation near to machine code generation in a to-machine-code compiler.

My point is that wrt capturing python semantics in python in a way that is statically analyzable in order to produce widely different "backends" the notion of untyped bits vector is likely too low level. (Even if the backend set should encompass "compilers").

OK then, in that case, sure. Bit vectors do end up being too low-level. I think it would really help this project to have a clear statement of what we're aiming to do: ie., do we want a parser with multiple backends, two of which are interpet and compile, or do we want a simple python->machine code or python->execute as interpeted translator?

Yes. At the moment, we want any and all of that. Even the statement about bit vectors being too low-level is a bit ;-) early to be stated. Together with a good set of description objects for the bits in the vector, it does make sense. This is the really intersting structure, the vectors is much less relevant. -chris

Samuele Pedroni

12:29 p.m.

From: "Christian Tismer" <tismer@tismer.com>

...

Yes. At the moment, we want any and all of that. Even the statement about bit vectors being too low-level is a bit ;-) early to be stated. Together with a good set of description objects for the bits in the vector, it does make sense. This is the really intersting structure, the vectors is much less relevant.

it is worth to rember that some potential targerts have no notion of pointer, and of casting between integers and pointers, only opaque references.

Thomas Heller

2:41 p.m.

New subject: ctypes news (was Re: __builtin__ module)

holger krekel <hpk@trillke.net> writes:

...

Anyway, I would try very hard to express all the builtins in python. And i see Thomas Heller's ctypes approach as a way to make this possible.

So do I ;-) ;-), although I'm have no idea where to begin. Construct a new Python type by assembling the type structure completely as a ctypes type? ----- Here are some news on ctypes: Development has has moved to SF: http://sourceforge.net/projects/ctypes. New web pages (but not much new content) I'm actively working on it right now: writing tests to shake out all the little bugs (there are lots of them, although noone has complained about them), and fixing them on the fly. I think I'm halfway through the fixing. In case someone cares, I've already rewritten the argument conversion stuff, which was a rather large change. libffi is well integrated. I'm routinely running the tests under Linux and Windows 2000 always. Just van Rossum has submitted a patch to make it work under MacOS X also. If someone wants to read the code, best would be to retrieve it with anon CVS. There's a mailing list now, although I'm currently the only subscriber. I'm not sure it will be useful, I created it just in case. Thomas

holger krekel

8:35 p.m.

New subject: ctypes news (was Re: __builtin__ module)

[Thomas Heller Fri, Jan 24, 2003 at 03:41:42PM +0100]

...

holger krekel <hpk@trillke.net> writes:

...
Anyway, I would try very hard to express all the builtins in python. And i see Thomas Heller's ctypes approach as a way to make this possible.

So do I ;-) ;-), although I'm have no idea where to begin. Construct a new Python type by assembling the type structure completely as a ctypes type?

Hmm, that's one of my black areas in current CPython: the type system. I admit it's a rather large one :-) But wouldn't using a ctypes represenation of Python types only be neccessary when interfacing with CPython C stuff? IMV the 'chr' function could be implemented at 'interpreter level' and have representation-optimized access methods. So if a PyString had a ctype as its 'array' (is that possible today?) then effectively an assembler intruction fetching one byte and using that as a representation for a PyInt object might do it. I am trying to use the terms from previous discussions. But unless i coded some of that myself i am sure that i am not completly making sense ....

...

Here are some news on ctypes:

Development has has moved to SF: http://sourceforge.net/projects/ctypes.

Thomas, can you imagine the forthcoming pypy-repository and your ctypes-repo to merge at some point? CTypes would of course still have its own directory and could be separately tagged etc. Here, I am already thinking about how we can actually manage our upcoming coding at the Sprint. I think that having "everything neccessary" in one repo would be quite convenient. regards, holger

Thomas Heller

9 p.m.

New subject: ctypes news (was Re: __builtin__ module)

holger krekel <hpk@trillke.net> writes:

...

...
Here are some news on ctypes:

Development has has moved to SF: http://sourceforge.net/projects/ctypes.

Thomas, can you imagine the forthcoming pypy-repository and your ctypes-repo to merge at some point? CTypes would of course still have its own directory and could be separately tagged etc.

Here, I am already thinking about how we can actually manage our upcoming coding at the Sprint. I think that having "everything neccessary" in one repo would be quite convenient.

Why not (if ctypes is really that important for pypy)? We can always merge later again, keep a forked ctypes, or whatever. OTOH, I only have a very vague impression how ctypes could help pypy, except for maybe bootstrapping, or accessing a small core implemented in C. And for this, ctypes should be fine already, and, if it's not I would like to merge your patches in. Thomas

Armin Rigo

4:10 p.m.

Hello Scott, On Fri, Jan 24, 2003 at 07:43:13AM -0500, Scott Fenton wrote:

...

I disagree. My feeling about this is probably that everything that can be expressed as a function should be in pure python, and that which can't should probably be C

No! I was not saying it should be in C. *Nothing* should be in C. I'm using the word "built-in" to mean a function at the interpreter-level, as opposed to a function at the application-level. I still think that the complexity of "string modulo" is best written as a (non-built-in) Python function. A thing like chr() on the other hand is trivial to implement as a built-in, with code like this: def builtin_chr(v): # v is a PyInt class instance i = v.parse_long() if i not in range(256): raise EPython(PyExc(ValueError), PyStr('chr() arg not in range(256)') return PyStr(chr(i)) Compare this with bltinmodule.c:chr(). Yes, I know there is still a call to 'chr()' in my implementation. But I'm at the interpreter level. The above function (including its chr() call) is easy to translate to C to make the equivalent of bltinmodule.c:chr(). BTW the syntax "i not in range(256)" is more conceptual than "i<0 or i>=256" and expresses better what we are testing for (as the error message suggests too). The above code is a typical example of how I see "built-in" functions could be written. The function is "built-in" not because it is written in another language but because it works at the interpreter level. A bientôt, Armin.

Nathan Heagy

4:23 p.m.

...

BTW the syntax "i not in range(256)" is more conceptual than "i<0 or i>=256" and expresses better what we are testing for (as the error message suggests too).

Why not "if 0 < i < 256:" ? -- Nathan Heagy phone:306.653.4747 fax:306.653.4774 http://www.zu.com

Bengt Richter

12:52 a.m.

At 10:23 2003-01-24 -0600, Nathan Heagy wrote:

...

...
BTW the syntax "i not in range(256)" is more conceptual than "i<0 or i>=256" and expresses better what we are testing for (as the error message suggests too).

Why not "if 0 < i < 256:" ?

I think it's because your ineqality refers to an interval in the set of all integers (or reals for that matter), whereas the concept of chr derives from the ordinal place of a character in a specific finite _ordered_ set of characters. IOW if the char set were represented as a sequence named charset, then the index test really should be "i not in range(len(charset))". If you think of charset having a 1:1 corresponding indexset, then c = chr(i) can be expressed as c = charset[indexset.index(i)] and conventionally the index set for ord is range(len(charset)), because that ordered set naturally represents position in the set. Note that you could have a non-integer i and index set in this formulation, so Armin's test doesn't just test a numerical value within an interval, it tests whether the index is an allowable index object (i.e., member of the indexset). Of course if indexset is range(x) and i is in indexset, indexset.index(i) is a noop, so after Armin's test, you can safely write c = charset[i]. For better conceptual purity (and less-magical magic numbers ;-), I might write your inequality as if isinstance(i, int) and ord(charset[0]) <= i <= ord(charset[-1]) and let ord(c) be charset.index(c) Maybe Psyco would ultimately generate the same code ;-) Regards, Bengt

Christian Tismer

3:21 a.m.

Nathan Heagy wrote:

...

BTW the syntax "i not in range(256)" is more conceptual than "i<0 or i>=256" and expresses better what we are testing for (as the error message suggests too).

Why not "if 0 < i < 256:" ?

Well, the generated code should finally be the same. But "i not in range(256)" expresses that i should not be in the set of those 256 values. That's more expressive, since it does not imply that i needs to be an integer at all. It does not impose that i has a data type that is ordered, and it does not require that i has to know how to compare itself to be less or greater than anything else. It just says "do not be any of these". ciao - chris

Armin Rigo

11:32 a.m.

Hello Nathan, On Fri, Jan 24, 2003 at 10:23:46AM -0600, Nathan Heagy wrote:

...

...
"i not in range(256)"

Why not "if 0 < i < 256:" ?

Ultimately because "i in range()" is a single test. This is what we want to say: "i is in the acceptable range". A translator working on the "0 <= i < 256" version would have to use patterns to figure out that we are actually asking whether "i" is within a range, and not simply making two tests "0 <= i" and "i < 256", in case it allows to generate more natural code. Conversely, it is trivial to implement "in range()" as a double comparison if needed. Armin

Nathan Heagy

4:34 p.m.

...

chr(i) vs. '%c'%i vs. '\x00\x01...\xFF'[i]:

Isn't part of this decision is whether the string type will itself be written in Python? If so then the chr(i) functionality will be a method of the String class, and even if String class is written in C chr() could probably still be a class method. Perhaps minimalPython does need a char type so that strings can be written in python and not C? -- Nathan Heagy phone:306.653.4747 fax:306.653.4774 http://www.zu.com

Christian Tismer

3:24 a.m.

Nathan Heagy wrote:

...

chr(i) vs. '%c'%i vs. '\x00\x01...\xFF'[i]:

Isn't part of this decision is whether the string type will itself be written in Python? If so then the chr(i) functionality will be a method of the String class, and even if String class is written in C chr() could probably still be a class method. Perhaps minimalPython does need a char type so that strings can be written in python and not C?

The implementation enginge of MinimalPython, not MinimalPythonitself necessarily, needs to have a way to express "this is a single char". This will most probably be expressed by using an instance of an according type like in ctypes. This is not necessary in the first iteration of the bootstrap, but later, when we have a running engine, and begin to make it efficient. cheers - chris

holger krekel

9:50 a.m.

[Bengt Richter Wed, Jan 22, 2003 at 03:21:26PM -0800]

...

[a] Should I have cc'd Armin, Chris, and Holger? I.e., would that have been courtesy or redundant annoyance?

it would have been redundant. holger

8065

Age (days ago)

8070

Last active (days ago)

List overview

Download

37 comments

9 participants

participants (9)

Armin Rigo
Bengt Richter
Boyd Roberts
Christian Tismer
holger krekel
Nathan Heagy
Samuele Pedroni
Scott Fenton
Thomas Heller

__builtin__ module

Scott Fenton

holger krekel

Nathan Heagy

Christian Tismer

Boyd Roberts

Boyd Roberts

Scott Fenton

Scott Fenton

holger krekel

Scott Fenton

Scott Fenton

Samuele Pedroni

Scott Fenton

Samuele Pedroni

Scott Fenton

Christian Tismer

Samuele Pedroni

holger krekel

Nathan Heagy

Christian Tismer

Nathan Heagy

Christian Tismer

holger krekel

tags

participants (9)

builtin module