[Tutor] Amazing power of Regular Expressions...

Mon Nov 6 19:40:37 CET 2006

"Michael Sparks" <ms at cerenity.org> wrote
> A regex compiles to a jump table, and executes as a statemachine.

Sometimes, but it depends on the implementation.

If someone uses a C++ compiler that uses the standard library
implememation of regex (like say the Microsoft Visual C++
version up till V6 did) then they will not get that and they will have
slow regex. The GNU compiler OTOH does provide fast regex.
But it depends entirely on the implementation of the regex library.
And can even depend on the compiler chosen in the case above
since many dynamic languages use the underlying library for
their regex implementation.

> or the implementation in the library is crap, and I don't believe 
> for a
> second python would be that dumb)

Maybe not, but C/C++ can be!

> Given you can compile the regex and use it over and over again (as 
> was
> the context the example code (isplain) was executed in) using a 
> regex
> is absolutely the best way.

Fine if the user compiles it, but for one-off use its quite common
not to compile regex. For one-off the compilation can be slower
than the one-off execution.

> I'd be extremely surprised if you could get your suggested approach 
> faster.

Like I said it depends on an awful lot of factors.
One reason perl is popular is that it has the worlds best regex
implementation, it regularly beat compiled C programs when it
first came out. Nowadays the regex libraries are improving,
but the VB one, for example, is still pretty slow in my experience,
even in VB.Net

> important. (heck, that's the reason regexes are useful - compact 
> clear
> representations of a lexical structure that you want to check a 
> string
> matches rather than obfuscated by the language doing the checking)

I've never heard anyone claim that regex's were clear before :-)

>   * ^[0-9A-Za-z_.-]*$
>
> Is a very simple pattern, and as a result an extremely simple 
> specification.

That is quite simple for a regex, but I'd still argue that for 
beginners
(which is what this list is about) thats still quite an eyeful.

> If any developer has a problem with that sort of pattern, they 
> really need to
> go away and learn regexes, since they're missing important tools. 
> (which
> shouldn't be over used).

I agree totally.

> I'm serious, if you think ^[0-9A-Za-z_.-]*$ is unclear and complex, 
> go away
> and relearn regexes.

I think its complex for an novice amateur programmer.

And to Kent's point re the C library, he is quite correct, toupper() 
etc use
a lookup table not a dictionary, my mistake. But the difference 
between
hashing and indexing is still less than a regex pattern match for a 
single
character.

And the point of timing is also valid, but I too can't be bothered to 
try...

Alan G.