[Baypiggies] More dramatic material?

K. Richard Pixley rich at noir.com
Wed Feb 24 04:13:24 CET 2010


Regular expressions aren't particularly fast.  That's not the reason for 
using them.  In fact, there was a sort of competition going back in the 
80's with various types of grep's.  Look up boyer-moore, (bgrep), and 
the difference between egrep and fgrep.  The bottom line is that for 
certain kinds of expressions, different algorithms are more efficient.  
But the more efficient algorithms trade robustness and problem domain 
for speed.  That is, they work faster, but only on a subset of cases.  
On the others, they either don't work at all or silently produce 
incorrect results.

Similarly, LALR(0) languages are easier and faster to parse than 
LALR(1).  You don't really need a parser generator like yacc, bison, or 
byacc if your language is LALR(0).

There's a related, but somewhat more subdued competition going on right 
now about binary diff programs, which have many related properties.  So 
do compression programs and I think everyone's probably aware of that 
long standing competition.  And I'll only mention message digesting, 
(aka, encryption), which also has similarities.

Anything done in C is faster than anything done in python, (assuming 
competent implementations).  The exceptions being things where "pure" 
python is implemented in C anyway.

The win to regular expressions is not speed, but rather that they can 
search for and find things that other types of searching can't, or can't 
do easily.  For "dramatic" examples, try searching for palindromes, or 
repeated phrases like 'moinmoin' or 'wikiwiki', or sequences of double 
repeating characters like vaccuum.

--rich


More information about the Baypiggies mailing list