
[Fredrik Lundh, whose very nice eMatter book is on sale until the end of the 20th century (as real people think of it), although the eMatter distribution scheme has lots of problems [just an editorial note from a bot who has to-- for unknown reasons Fatbrain "is working on" --delete the Fatbrain registry tree and reregister the book almost every time he tries to open it <wink> ] ]
we have something called SIO which uses memory mapping where possible, and just a more aggressive read-ahead for other cases. on a windows box, a traditional while/readline loop runs 3-5 times faster than before. with SRE instead of re, a while/readline/match loop runs up to 10 times faster than before.
note that this is without *any* changes to the Python source code...
If so, there's potential for significantly more speed. Python does its line-at-a-time input with a character-at-a-time macro-in-a-loop, the same way naive vendors (read "almost all vendors") implement fgets. It's replacing that inner loop with direct peeking into the FILE buffer that gets Perl its dramatic speed -- despite that Perl has fancier input functionality (the oft-requested automagical "input record separator"). So it sounds like the Perl trick is orthogonal to SIO's tricks; Perl isn't doing mmaps or read-aheads or anything else fancy under the covers -- it only optimizes the inner loop!
... with a little luck, the new module will replace both pcre and regex...
If something more tangible than luck would help to make this come true, feel free to mention it <wink>.
not to mention that it's fairly easy to write your own front- end to the matching engine -- the expression parser and the compiler are both written in good old python.
Ah, good news / bad news. Perl refugees aren't accustomed to "precompiling" regexp objects, so write code that will cause regexps to get recompiled over & over. Even if you cache the results under the covers, the overhead of the Python call to the regexp compiler will likely take as long as the engine takes to search. Personally, in such cases, I think they should learn how to use the language <0.5 wink>.