Python 3 regex?

Jussi Piitulainen jpiitula at ling.helsinki.fi
Tue Jan 13 14:58:11 CET 2015


alister <alister.nospam.ware at ntlworld.com> writes:

> On Tue, 13 Jan 2015 04:36:38 +0000, Steven D'Aprano wrote:
> 
> > On Mon, 12 Jan 2015 19:48:18 +0000, Ian wrote:
> > 
> >> My recommendation would be to write a recursive decent parser for
> >> your files.
> >> 
> >> That way will be easier to write,
> > 
> > I know that writing parsers is a solved problem in computer
> > science, and that doing so is allegedly one of the more trivial
> > things computer scientists are supposed to be able to do, but the
> > learning curve to write parsers is if anything even higher than
> > the learning curve to write a regex.
> > 
> > I wish that Python made it as easy to use EBNF to write a parser as it
> > makes to use a regex :-(
> > 
> > http://en.wikipedia.org/wiki/Extended_Backus–Naur_Form
> 
> I would not say that writing parsers is a solved problem.  there may
> be solutions for a number of specific cases but many cases still
> cause difficulty, as an example I do not think there is a 100%
> complete parser for English (even native English speakers don't
> always get it)

There is no complete characterization of English as a set of character
strings, nor will there ever be. Linguists have a slogan for this: All
Grammars Leak. (They used to write formal grammars to characterize
"all and only the well-formed sentences" of a language, or to capture
"necessary and sufficient conditions", and those grammars turned out
to both "over-generate" and "under-generate".)

Ambiguity doesn't help. In practice, it's not enough to find a parse.
One wants a contextually appropriate parse. Sometimes this requires
genuine understanding and knowledge. Also in practice, one may not be
in the business of rejecting ill-formed sentences: one wants to make
partial sense of even those. So, no, never 100 percent complete or 100
percent correct :)

The solved problem is the unambiguous parsing of formal languages that
are defined by a formal grammar to begin with, like the configuration
file format at hand.



More information about the Python-list mailing list