Most efficient method to search text?

Michael Hudson mwh at
Thu Oct 17 13:14:09 CEST 2002

Tim Peters < at> writes:

> Especially for purposes of building lexers, it might be useful if the re
> package could recognize when a DFA approach was sufficient and practical,
> and switch to a different scheme entirely then.  Or it might not.  Build
> code to try it both ways, and let us know how it turns out ...

Indeed, my code canes re when the wordlist gets long.  Here's some
numbers (do_comp(n) builds a list of n words composed of ten random
ascii characters and searches the Python standard library for them
using first my code, then a re-based approach):

>>> import robin
>>> robin.do_comp(10)
>>> robin.do_comp(100)
>>> robin.do_comp(1000)
>>> robin.do_comp(2000)

So roughly linear behaviour from re, n-independent behaviour from my
code.  Isn't it nice when things do what you expect?

On the other hand, the "compiler" code I posted yesterday consumes
vast gobs of memory -- trying to compile a 10000 long list of random
words got the OOM killer into action.  A version using lists seems
slightly better in this respect, though slower in execution.


 Have you considered downgrading your arrogance to a reasonable level?
        -- Erik Naggum, comp.lang.lisp, to yet another C++-using troll

More information about the Python-list mailing list