
Eric> This is one of bogofilter's strengths. It already does this stuff Eric> at the lexical level using a speed-tuned flex scanner (I spent a Eric> lot of the development time watching token strings go by and Eric> tweaking the scanner rules to throw out cruft). This reminds me of something which tickled my interesting bone the other day. The SpamAssassin folks are starting to look at Flex for much faster regular expression matching in situations where large numbers of static re's must be matched. I wonder if using something like SciPy's weave tool would make it (relatively) painless to incorporate fairly high-speed scanners into Python programs. It seems like it would just be an extra layer of compilation for something like weave. Instead of inserting C code into a string, wrapping it with module sticky stuff and compiling it, you'd insert Flex rules into the string, call a slightly higher level function which calls flex to generate the scanner code and use a slightly different bit of module sticky stuff to make it callable from Python. Skip