ask for a RE pattern to match TABLE in html

Jonathan Gardner jgardner at jonathangardner.net
Fri Jun 27 01:09:58 CEST 2008


On Jun 26, 11:07 am, Grant Edwards <gra... at visi.com> wrote:
> On 2008-06-26, Stefan Behnel <stefan... at behnel.de> wrote:
> >
> > Why not use an HTML parser instead?
> >
>
> Stating it differently: in order to correctly recognize HTML
> tags, you must use an HTML parser.  Trying to write an HTML
> parser in a single RE is probably not practical.
>

s/practical/possible

It isn't *possible* to grok HTML with regular expressions. Individual
tags--yes. But not a full element where nesting is possible. At least
not properly.

Maybe we need some notes on the limits of regular expressions in the
re documentation for people who haven't taken the computer science
courses on parsing and grammars. Then we could explain the necessity
of real parsers and grammars, at least in layman's terms.



More information about the Python-list mailing list