continuationmodule.c preview

Howdy, please find attached my latest running version of continuationmodule.c which is really able to do continuations. You need stackless Python 0.3 for it, which I just submitted. This module is by no means ready. The central functions are getpcc() and putcc. Call/cc is at the moment to be done like: def callcc(fun, *args, **kw): cont = getpcc() return apply(fun, (cont,)+args, kw) getpcc(level=1) gets a parent's current continuation. putcc(cont, val) throws a continuation. At the moment, these are still frames (albeit special ones) which I will change. They should be turned into objects which have a link to the actual frame, which can be unlinked after a shot or by hand. This makes it easier to clean up circular references. I have a rough implementation of this in Python, also a couple of generators and coroutines, but all not pleasing me yet. Due to the fact that my son is ill, my energy has dropped a little for the moment, so I thought I'd better release something now. I will make the module public when things have been settled a little more. ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

The latest versions of the Icon language (9.3.1 & beyond) sprouted an interesting change in semantics: if you open a file for reading in "translated" (text) mode now, it normalizes Unix, Mac and Windows line endings to plain \n. Writing in text mode still produces what's natural for the platform. Anyone think that's *not* a good idea? c-will-never-get-fixed-ly y'rs - tim

I've been thinking about this myself -- exactly what I would do. Not clear how easy it is to implement (given that I'm not so enthused about the idea of rewriting the entire I/O system without using stdio -- see archives). The implementation must be as fast as the current one -- people used to complain bitterly when readlines() or read() where just a tad slower than they *could* be. There's a lookahead of 1 character needed -- ungetc() might be sufficient except that I think it's not guaranteed to work on unbuffered files. Should also do this for the Python parser -- there it would be a lot easier. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Tim]
[Guido]
I've been thinking about this myself -- exactly what I would do.
Me too <wink>.
The Icon implementation is very simple: they *still* open the file in stdio text mode. "What's natural for the platform" on writing then comes for free. On reading, libc usually takes care of what's needed, and what remains is to check for stray '\r' characters that stdio glossed over. That is, in fileobject.c, replacing if ((*buf++ = c) == '\n') { if (n < 0) buf--; break; } with a block like (untested!) *buf++ = c; if (c == '\n' || c == '\r') { if (c == '\r') { *(buf-1) = '\n'; /* consume following newline, if any */ c = getc(fp); if (c != '\n') ungetc(c, fp); } if (n < 0) buf--; break; } Related trickery needed in readlines. Of course the '\r' business should be done only if the file was opened in text mode.
The above does add one compare per character. Haven't timed it. readlines may be worse. BTW, people complain bitterly anyway, but it's in comparison to Perl text mode line-at-a-time reads! D:\Python>wc a.c 1146880 3023873 25281537 a.c D:\Python> Reading that via def g(): f = open("a.c") while 1: line = f.readline() if not line: break and using python -O took 51 seconds. Running the similar Perl (although it's not idiomatic Perl to assign each line to an explict var, or to test that var in the loop, or to use "if !" instead of "unless" -- did all those to make it more like the Python): open(DATA, "<a.c"); while ($line = <DATA>) {last if ! $line;} took 17 seconds. So when people are complaining about a factor of 3, I'm not inclined to get excited about a few percent <wink>.
Don't believe I've bumped into that. *Have* bumped into problems with ungetc not playing nice with fseek/ftell, and that's probably enough to kill it right there (alas).
Should also do this for the Python parser -- there it would be a lot easier.
And probably the biggest bang for the buck. the-problem-with-exposing-libc-is-that-libc-isn't-worth-exposing<wink-ly y'rs - tim

[Tim, notes that Perl line-at-a-time text mode input runs 3x faster than Python's on his platform] And much to my surprise, it turns out Perl reads lines a character at a time too! And they do not reimplement stdio. But they do cheat. Perl's internals are written on top of an abstract IO API, with "PerlIO *" instead of "FILE *", "PerlIO_tell(PerlIO *)" instead of "ftell(FILE*)", and so on. Nothing surprising in the details, except maybe that stdin is modeled as a function "PerlIO *PerlIO_stdin(void)" instead of as global data (& ditto for stdout/stderr). The usual *implementation* of these guys is as straight macro substitution to the corresponding C stdio call. It's possible to implement them some other way, but I don't see anything in the source that suggests anyone has done so, except possibly to build it all on AT&T's SFIO lib. So where's the cheating? In these API functions: int PerlIO_has_base(PerlIO *); int PerlIO_has_cntptr(PerlIO *); int PerlIO_canset_cnt(PerlIO *); char *PerlIO_get_ptr(PerlIO *); int PerlIO_get_cnt(PerlIO *); void PerlIO_set_cnt(PerlIO *,int); void PerlIO_set_ptrcnt(PerlIO *,char *,int); char *PerlIO_get_base(PerlIO *); int PerlIO_get_bufsiz(PerlIO *); In almost all platform stdio implementations, the C FILE struct has members that may vary in name but serve the same purpose: an internal buffer, and some way (pointer or offset) to get at "the next" buffer character. The guys above are usually just (after layers & layers of config stuff sets it up) macros that expand into the platform's internal way of spelling these things. For example, the count member is spelled under Windows as fp->_cnt under VC, or as fp->level under Borland. The payoff is in Perl's sv_gets function, in file sv.c. This is long and very complicated, but at its core has a fast inner loop that copies characters (provided the PerlIO_has/canXXX functions say it's possible) directly from the stdio buffer into a Perl string variable -- in the way a platform fgets function *would* do it if it bothered to optimize fgets. In my experience, platforms usually settle for the same kind of fgetc/EOF?/newline? loop Python uses, as if fgets were a stdio client rather than a stdio primitive. Perl's keeps everything in registers inside the loop, updates the FILE struct members only at the boundaries, and doesn't check for EOF except at the boundaries (so long as the buffer has unread stuff in it, you can't be at EOF). If the stdio buffer is exhausted before the input terminator is seen (Perl has "input record separator" and "paragraph mode" gimmicks, so it's hairier than just looking for \n), it calls PerlIO_getc once to force the platform to refill the buffer, and goes back to the screaming loop. Major hackery, but major payoff (on most platforms) too. The abstract I/O layer is a fine idea regardless. The sad thing is that the real reason Perl is so fast here is that platform fgets is so needlessly slow. perl-input-is-faster-than-c-input-ly y'rs - tim

It's a trend <wink>: the latest version of the REBOL language also does this. The Java compiler does it for Java source files, but I don't know how runtime file read/write work in Java. Anyone know offhand if there's a reliable way to determine whether an open file descriptor (a C FILE*) is seekable? if-i'm-doomed-to-get-obsessed-by-this-may-as-well-make-it-faster- too-ly y'rs - tim

Tim Peters wrote:
Anyone know offhand if there's a reliable way to determine whether an open file descriptor (a C FILE*) is seekable?
I'd simply use trial&error: if (fseek(stream,0,SEEK_CUR) < 0) { if (errno != EBADF)) { /* Not seekable */ errno = 0; } else /* Error */ ; } else /* Seekable */ ; How to get this thread safe is left as exercise to the interested reader ;) Cheers, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 166 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

[Tim]
[/F]
if we were to change this, how would you tell Python to open a file in text mode?
Meaning whatever it is the platform libc does? In Icon or REBOL, you don't. Icon is more interesting because they changed the semantics of their "t" (for "translated") mode without providing any way to go back to the old behavior (REBOL did this too, but didn't have Icon's 15 years of history to wrestle with). Curiously (I doubt Griswold *cared* about this!), the resulting behavior still conforms to ANSI C, because that std promises little about text mode semantics in the presence of non-printable characters. Nothing of mine would miss C's raw text mode (lack of) semantics, so I don't care. I *would* like Python to define portable semantics for the mode strings it accepts in the builtin open regardless, and push platform-specific silliness (including raw C text mode, if someone really wants that; or MS's "c" mode, etc) into a new os.fopen function. Push random C crap into expert modules, where it won't baffle my sister <0.7 wink>. I expect Python should still open non-binary files in the platform's text mode, though, to minimize surprises for C extensions mucking with the underlying stream object (Icon/REBOL don't have this problem, although Icon opens the file in native libc text mode anyway). next-step:-define-tabs-to-mean-8-characters-and-drop-unicode-in- favor-of-7-bit-ascii<wink>-ly y'rs - tim

The latest versions of the Icon language (9.3.1 & beyond) sprouted an interesting change in semantics: if you open a file for reading in "translated" (text) mode now, it normalizes Unix, Mac and Windows line endings to plain \n. Writing in text mode still produces what's natural for the platform. Anyone think that's *not* a good idea? c-will-never-get-fixed-ly y'rs - tim

I've been thinking about this myself -- exactly what I would do. Not clear how easy it is to implement (given that I'm not so enthused about the idea of rewriting the entire I/O system without using stdio -- see archives). The implementation must be as fast as the current one -- people used to complain bitterly when readlines() or read() where just a tad slower than they *could* be. There's a lookahead of 1 character needed -- ungetc() might be sufficient except that I think it's not guaranteed to work on unbuffered files. Should also do this for the Python parser -- there it would be a lot easier. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Tim]
[Guido]
I've been thinking about this myself -- exactly what I would do.
Me too <wink>.
The Icon implementation is very simple: they *still* open the file in stdio text mode. "What's natural for the platform" on writing then comes for free. On reading, libc usually takes care of what's needed, and what remains is to check for stray '\r' characters that stdio glossed over. That is, in fileobject.c, replacing if ((*buf++ = c) == '\n') { if (n < 0) buf--; break; } with a block like (untested!) *buf++ = c; if (c == '\n' || c == '\r') { if (c == '\r') { *(buf-1) = '\n'; /* consume following newline, if any */ c = getc(fp); if (c != '\n') ungetc(c, fp); } if (n < 0) buf--; break; } Related trickery needed in readlines. Of course the '\r' business should be done only if the file was opened in text mode.
The above does add one compare per character. Haven't timed it. readlines may be worse. BTW, people complain bitterly anyway, but it's in comparison to Perl text mode line-at-a-time reads! D:\Python>wc a.c 1146880 3023873 25281537 a.c D:\Python> Reading that via def g(): f = open("a.c") while 1: line = f.readline() if not line: break and using python -O took 51 seconds. Running the similar Perl (although it's not idiomatic Perl to assign each line to an explict var, or to test that var in the loop, or to use "if !" instead of "unless" -- did all those to make it more like the Python): open(DATA, "<a.c"); while ($line = <DATA>) {last if ! $line;} took 17 seconds. So when people are complaining about a factor of 3, I'm not inclined to get excited about a few percent <wink>.
Don't believe I've bumped into that. *Have* bumped into problems with ungetc not playing nice with fseek/ftell, and that's probably enough to kill it right there (alas).
Should also do this for the Python parser -- there it would be a lot easier.
And probably the biggest bang for the buck. the-problem-with-exposing-libc-is-that-libc-isn't-worth-exposing<wink-ly y'rs - tim

[Tim, notes that Perl line-at-a-time text mode input runs 3x faster than Python's on his platform] And much to my surprise, it turns out Perl reads lines a character at a time too! And they do not reimplement stdio. But they do cheat. Perl's internals are written on top of an abstract IO API, with "PerlIO *" instead of "FILE *", "PerlIO_tell(PerlIO *)" instead of "ftell(FILE*)", and so on. Nothing surprising in the details, except maybe that stdin is modeled as a function "PerlIO *PerlIO_stdin(void)" instead of as global data (& ditto for stdout/stderr). The usual *implementation* of these guys is as straight macro substitution to the corresponding C stdio call. It's possible to implement them some other way, but I don't see anything in the source that suggests anyone has done so, except possibly to build it all on AT&T's SFIO lib. So where's the cheating? In these API functions: int PerlIO_has_base(PerlIO *); int PerlIO_has_cntptr(PerlIO *); int PerlIO_canset_cnt(PerlIO *); char *PerlIO_get_ptr(PerlIO *); int PerlIO_get_cnt(PerlIO *); void PerlIO_set_cnt(PerlIO *,int); void PerlIO_set_ptrcnt(PerlIO *,char *,int); char *PerlIO_get_base(PerlIO *); int PerlIO_get_bufsiz(PerlIO *); In almost all platform stdio implementations, the C FILE struct has members that may vary in name but serve the same purpose: an internal buffer, and some way (pointer or offset) to get at "the next" buffer character. The guys above are usually just (after layers & layers of config stuff sets it up) macros that expand into the platform's internal way of spelling these things. For example, the count member is spelled under Windows as fp->_cnt under VC, or as fp->level under Borland. The payoff is in Perl's sv_gets function, in file sv.c. This is long and very complicated, but at its core has a fast inner loop that copies characters (provided the PerlIO_has/canXXX functions say it's possible) directly from the stdio buffer into a Perl string variable -- in the way a platform fgets function *would* do it if it bothered to optimize fgets. In my experience, platforms usually settle for the same kind of fgetc/EOF?/newline? loop Python uses, as if fgets were a stdio client rather than a stdio primitive. Perl's keeps everything in registers inside the loop, updates the FILE struct members only at the boundaries, and doesn't check for EOF except at the boundaries (so long as the buffer has unread stuff in it, you can't be at EOF). If the stdio buffer is exhausted before the input terminator is seen (Perl has "input record separator" and "paragraph mode" gimmicks, so it's hairier than just looking for \n), it calls PerlIO_getc once to force the platform to refill the buffer, and goes back to the screaming loop. Major hackery, but major payoff (on most platforms) too. The abstract I/O layer is a fine idea regardless. The sad thing is that the real reason Perl is so fast here is that platform fgets is so needlessly slow. perl-input-is-faster-than-c-input-ly y'rs - tim

It's a trend <wink>: the latest version of the REBOL language also does this. The Java compiler does it for Java source files, but I don't know how runtime file read/write work in Java. Anyone know offhand if there's a reliable way to determine whether an open file descriptor (a C FILE*) is seekable? if-i'm-doomed-to-get-obsessed-by-this-may-as-well-make-it-faster- too-ly y'rs - tim

Tim Peters wrote:
Anyone know offhand if there's a reliable way to determine whether an open file descriptor (a C FILE*) is seekable?
I'd simply use trial&error: if (fseek(stream,0,SEEK_CUR) < 0) { if (errno != EBADF)) { /* Not seekable */ errno = 0; } else /* Error */ ; } else /* Seekable */ ; How to get this thread safe is left as exercise to the interested reader ;) Cheers, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 166 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

[Tim]
[/F]
if we were to change this, how would you tell Python to open a file in text mode?
Meaning whatever it is the platform libc does? In Icon or REBOL, you don't. Icon is more interesting because they changed the semantics of their "t" (for "translated") mode without providing any way to go back to the old behavior (REBOL did this too, but didn't have Icon's 15 years of history to wrestle with). Curiously (I doubt Griswold *cared* about this!), the resulting behavior still conforms to ANSI C, because that std promises little about text mode semantics in the presence of non-printable characters. Nothing of mine would miss C's raw text mode (lack of) semantics, so I don't care. I *would* like Python to define portable semantics for the mode strings it accepts in the builtin open regardless, and push platform-specific silliness (including raw C text mode, if someone really wants that; or MS's "c" mode, etc) into a new os.fopen function. Push random C crap into expert modules, where it won't baffle my sister <0.7 wink>. I expect Python should still open non-binary files in the platform's text mode, though, to minimize surprises for C extensions mucking with the underlying stream object (Icon/REBOL don't have this problem, although Icon opens the file in native libc text mode anyway). next-step:-define-tabs-to-mean-8-characters-and-drop-unicode-in- favor-of-7-bit-ascii<wink>-ly y'rs - tim
participants (5)
-
Christian Tismer
-
Fredrik Lundh
-
Guido van Rossum
-
M.-A. Lemburg
-
Tim Peters