re vs. sgmllib (was: Moving from Perl to Python)
Tim Peters
tim_one at email.msn.com
Sun Sep 26 22:01:01 EDT 1999
[Jon Fernquest]
> Since regular expressions are just a short-hand way of specifying
> (practically only *some*) regular languages whereas finite state
> machines can specify any regular language the next logical step
> would be a set of finite state tools like those that Xerox sells
> (for several thousands of dollars I might add).
Then I hope we can skip the next logical step and leapfrog illogically to a
real parser <wink>. Part of the problem here is that what people want to
parse these days-- from programming language fragments to SGML --isn't
regular. That doesn't stop them from trying to do it with regexps, and
input-sensitive bug-ridden code is the result. Heck, most people find it a
challenge to write a correct regexp to match a Python string -- or even a C
/**/ comment. Not that regular languages aren't useful, but I expect their
appropriate non-trivial applications will always be a wizard art.
> Perl's adoption of regular expressions was sort of a revolution
> I guess,
Na, Perl grew up in the Unix zoo, where at least a dozen popular tools used
their own flavor of regexps before it. Awk in particular pioneered tight
integration of regexps with a programming language, and Perl didn't add much
essential to what Awk did with them (indeed, Awk is still more convenient
for some kinds of text-crunching tasks!). What Perl did do is combine the
best features of all the preceding regexp notations, toss the worst, add a
few nice twists of its own, and make it all dance. Perl had some real
innovations, but wrt regexps it was mostly a nice synthesis of prior art.
> but there's another revolution looming on the horizon for the
> language that incorporates generalized finite state technology.
Python will be happy to accept a module <wink>.
> Finite state technology is really great for dealing with
> non-roman character sets,
Free Unicode regexp packages already exist, e.g.
http://ourworld.compuserve.com/homepages/John_Maddock/regexpp.htm
is a nice one for C++; and more are on the way. Since a million programmers
have already been deluded <0.6 wink> into thinking regexps are "the answer",
they're going to want more of the same.
> [more advocacy, and cool references, elided]
> ...
> The little language Gema also has language "acceptor" objects and
> also some recursive pattern matching capability which can be used
> to parse.
> http://www.telerama.com/~mundie/index.html
Here's example 1 from http://www.telerama.com/~mundie/Gema/GemaGems.html:
Example 1
Take a tab-delimited text file and make an HTML table out of it
\n\n*\n\n=<table border>\n$1</table>
\L<U>=\t<tr>\n at makerow{$0}\t</tr>
makerow:<P>=\t\t<td>$1</td>\n;?=
It certainly appeals to the Perl eye <wink>.
Note that Perl is in the process of adding recursive "regexps".
the-mind-boggles-ly y'rs - tim
More information about the Python-list
mailing list