Most efficient method to search text?

Thu Oct 17 07:14:09 EDT 2002

Tim Peters <tim.one at comcast.net> writes:

> Especially for purposes of building lexers, it might be useful if the re
> package could recognize when a DFA approach was sufficient and practical,
> and switch to a different scheme entirely then.  Or it might not.  Build
> code to try it both ways, and let us know how it turns out ...

Indeed, my code canes re when the wordlist gets long.  Here's some
numbers (do_comp(n) builds a list of n words composed of ten random
ascii characters and searches the Python standard library for them
using first my code, then a re-based approach):

>>> import robin
>>> robin.do_comp(10)
2.275832057
0.317215919495
>>> robin.do_comp(100)
2.2694439888
0.980538964272
>>> robin.do_comp(1000)
2.36677598953
8.53031408787
>>> robin.do_comp(2000)
2.34253299236
16.9222760201

So roughly linear behaviour from re, n-independent behaviour from my
code.  Isn't it nice when things do what you expect?

On the other hand, the "compiler" code I posted yesterday consumes
vast gobs of memory -- trying to compile a 10000 long list of random
words got the OOM killer into action.  A version using lists seems
slightly better in this respect, though slower in execution.

Cheers,
M.

-- 
 Have you considered downgrading your arrogance to a reasonable level?
        -- Erik Naggum, comp.lang.lisp, to yet another C++-using troll