Tim Peters wrote:
I don't know why this thread lead to tweaking stdio -- after all we only need a solution for the Python tokenizer ...
Aaaaaaaaaaaargh! ;-) Here we go again: fixing the tokenizer is great and all,> but then what about all tools that read source files line by line? ...
Note that this is why the topic needs a PEP: nothing here is new; the same debates reoccur every time it comes up.
... QIO claims that it can be configured to recognize different kinds of line endings.
It can be, yes, but in the same sense as Awk/Perl paragraph mode: you can tell it to consider any string (not just single character) as meaning "end of the line", but it's a *fixed* string per invocation. What people want *here* is more the ability to recognize the regular expression
as ending a line, and QIO can't do that directly (as currently written). And MAL probably wants Unicode line-end detection:
QIO is claimed to be 2-3 times faster than Python 1.5.2; don't know how that compares to 2.x.
The bulk of that was due to QIO avoiding per-character thread locks. 2.1 avoids them too, so most of QIO's speed advantage should be gone now. But QIO's internals could certainly be faster than they are (this is obscure because QIO.readline() has so many optional behaviors that the maze of if-tests makes it hard to see the speed-crucial bits; studying Perl's line-reading code is a better model, because Perl's speed-crucial inner loop has no non-essential operations -- Perl makes the *surrounding* code sort out the optional bits, instead of bogging down the loop with them).
Just curious: for the applications which Just has in mind, reading source code line-by-line is not really needed. Wouldn't it suffice to read the whole file, split it into lines and then let the tools process the resulting list of lines ?
Maybe a naive approach, but one which will most certainly work on all platforms without having to replace stdio...