
[Tim]
No, there's nothing wrong with the tokenizer code: it's coded in C, and the C text convention is that lines end with \n, period. Reliance on that convention is ubiquitous -- and properly so.
[Just van Rossum]
I don't get it: why would a thin layer on top of stdio be bad? Seems much less work than reimplementing stdio.
What does that question have to do with the snippet you quoted? In context, that snippet was saying that if you did write a small layer on top of stdio, one that just made \n show up when and only when you think Python should believe a line ends, then nothing in the tokenizer would need to change (except to call that layer instead of fgets()), and even the tokenizer's current \r\n mini-hack could be thrown away. If that's all you want, that's all it takes. If you want more than just that, you need more than just that (but I see Guido already explained that, and I explained too why the Windows Python cannot recognize \r endings with reasonable speed for *general* use short of building our own stdio -- but I don't really much care how fast the compiler runs, if all you want is the same limited level of hack afforded by the existing one-shot \r\n tokenizer trick -- and the compiler isn't using the *general*-case fileobject.c get_line() anyway). you-pay-for-what-you-want-and-the-more-you-want-the-more-you'll-pay-ly y'rs - tim