
[Greg Ewing]
Not necessarily! Plex manages to do it without any of that.
The trick is to leave all the characters in the input buffer and just *count* how many characters make up the next token. Once you've decided where the token ends, one slice gives it to you.
Plex is very nice! It doesn't pass my "convient and fast" test only because the DFA at the end still runs at Python speed, and one character at a time is still mounds slower than it could be in C. Hmm. But you can also generate pretty reasonable C code from Python source now too! You're going to solve this yet, Greg. Note that mxTextTools also computes slice indices for "tagging", rather than build up new string objects. Heck, that's also why Guido (from the start) gave the regexp and string match+search gimmicks optional start-index and end-index arguments too, and why one of the "where did this group match?" flavors returns slice indices. I think Eric has spent too much time debugging C lately <wink>.