a little parsing challenge ☺

Thomas 'PointedEars' Lahn PointedEars at web.de
Sun Jul 17 10:15:46 EDT 2011


Chris Angelico wrote:

> On Sun, Jul 17, 2011 at 5:47 PM, Xah Lee <xahlee at gmail.com> wrote:
>> the problem is to write a script that can check a dir of text files
>> (and all subdirs) and reports if a file has any mismatched matching
>> brackets.
> 
> I wonder will it be possible to code the whole thing as a single
> regular expression... I'm pretty sure it could be done as a sed or awk
> script, but I'm insufficiently expert in either to do the job.

Did you notice the excessive crosspost?  Please do not feed the troll.

In the classical sense is not possible, as classical regular expressions 
have no concept of recursion.  Indeed, matching brackets are a textbook 
example for a non-regular¹ context-free language L = {a^n b^n; n > 0} that 
can only be recognized by a pushdown automaton (PDA).  (Finite automata 
"cannot count".)

However, it is possible (and done) to use classical regular expressions or 
non-recursive Regular Expressions (note the different character case) to 
match tokens more efficiently with a PDA implementation.  This is commonly 
called a parser.  (Programming languages tend to be specified in terms of a 
context-free grammar – they tend to be context-free languages –, which is 
why a parser is a part of a compiler or interpreter.  See for example 
Python.²)

It is possible, with Perl-compatible Regular Expressions (PCRE), provided 
that you have enough memory, to use such an extended Regular Expression (not 
to be confused with EREs³)⁴:

  \((([^()]*|(?R))*)\)

However, even Python 3.2 does not support those expressions (although it 
supports some other PCRE patterns, like named subexpressions)⁵, neither do 
standard and forked versions of sed(1) (BREs, EREs, using an NFA) nor awk 
(EREs, using a DFA or NFA).  [That is not to say it would not be possible 
with Python, or sed or awk (both of which are off-topic here), but that more 
than a Regular Expression would be required.]

On this subject, I recommend reading the preview chapters of the second and 
third editions, respectively, of Jeffrey E. F. Friedl's "Mastering Regular 
Expressions", which are available online for free at O'Reilly.com⁶.

HTH.

______
¹ because it can be proved that the pumping lemma for regular languages
  does not apply to it; see also
  <http://en.wikipedia.org/wiki/Chomsky_hierarchy> pp.
² <http://docs.python.org/reference/>
³ <http://en.wikipedia.org/wiki/Regular_expression>
⁴ Cf. <http://php.net/manual/en/regexp.reference.recursive.php>
⁵ <http://docs.python.org/library/re.html>
⁶ <http://oreilly.com/catalog/regex/chapter/ch04.html>,
  <http://oreilly.com/catalog/9780596528126/preview#preview>
-- 
PointedEars

Bitte keine Kopien per E-Mail. / Please do not Cc: me.



More information about the Python-list mailing list