a little parsing challenge ?

sln at netherlands.com sln at netherlands.com
Mon Jul 18 21:34:45 CEST 2011


On Sun, 17 Jul 2011 00:47:42 -0700 (PDT), Xah Lee <xahlee at gmail.com> wrote:

>2011-07-16
>
>folks, this one will be interesting one.
>
>the problem is to write a script that can check a dir of text files
>(and all subdirs) and reports if a file has any mismatched matching
>brackets.
>
[snip]
>i hope you'll participate. Just post solution here. Thanks.
>

I have to hunt for a job so I'm not writing a solution for you.
Here is a thin regex framework that may get you started.

-sln

---------------------

use strict;
use warnings;

 my @samples = qw(
  A98(y[(np)r]x)tp[kk]a.exeb
  A98(y[(np)r]x)tp[kk]a}.exeb
  A98(‹ynprx)tpk›ka.mpeg
  ‹A98(ynprx)tpk›ka
  “A9«8(yn«pr{{[g[x].}*()+}»)tpkka».”
  “A9«8(yn«pr{{[g[x].]}*()+}»)tpkka».”
  “A9«8(yn«pr»)tpkka».”
  “A9«8(yn«pr»)»”t(()){}[a[b[d]{}]pkka.]“«‹“**^”{[()]}›»”
  “A9«8(yn«pr»)”t(()){}[a[b[d]{}]pkka.]“«‹“**^”{[()]}›»”
 );

 my $regex = qr/

  ^ (?&FileName) $

  (?(DEFINE)

      (?<Delim>     
            \( (?&Content) \)
          | \{ (?&Content) \}
          | \[ (?&Content) \]
          | \“ (?&Content) \”
          | \‹ (?&Content) \›
          | \« (?&Content) \»
             # add more here ..
      )

      (?<Content>
           (?:  (?> [^(){}\[\]“”‹›«»]+ ) # add more here ..
              | (?&Delim)
           )*
      ) 

      (?<FileName>
           (?&Content)
      )
    )
 /x;


 for (@samples)
 {
    print "$_ - ";
    if ( /$regex/ ) {
       print "passed \n";
    }
    else {
       print "failed \n";
    }
 }

__END__

Output:

A98(y[(np)r]x)tp[kk]a.exeb - passed 
A98(y[(np)r]x)tp[kk]a}.exeb - failed 
A98(‹ynprx)tpk›ka.mpeg - failed 
‹A98(ynprx)tpk›ka - passed 
“A9«8(yn«pr{{[g[x].}*()+}»)tpkka».” - failed 
“A9«8(yn«pr{{[g[x].]}*()+}»)tpkka».” - passed 
“A9«8(yn«pr»)tpkka».” - passed 
“A9«8(yn«pr»)»”t(()){}[a[b[d]{}]pkka.]“«‹“**^”{[()]}›»” - passed 
“A9«8(yn«pr»)”t(()){}[a[b[d]{}]pkka.]“«‹“**^”{[()]}›»” - failed 





More information about the Python-list mailing list