
Aug. 21, 2002
10:47 p.m.
Guido> I haven't given up on the re module for fast scanners (see Tim's Guido> note on the speed of tokenizing 20,000 messages in mere minutes). Guido> Note that the Bayes approach doesn't *need* a trick to apply many Guido> regexes in parallel to the text. Right. I'm thinking of it in situations where you do need such tricks. SpamAssassin is one such place. I think Eric has an application (quickly tokenizing the data produced by an external program, where the data can run into several hundreds of thousands of lines) where this might be beneficial as well. Skip