Re: a little parsing challenge ☺

Mark Tarver dr.mtarver at gmail.com
Wed Jul 20 01:43:42 EDT 2011


On Jul 17, 8:47 am, Xah Lee <xah... at gmail.com> wrote:
> 2011-07-16
>
> folks, this one will be interesting one.
>
> the problem is to write a script that can check a dir of text files
> (and all subdirs) and reports if a file has any mismatched matching
> brackets.
>
> • The files will be utf-8 encoded (unix style line ending).
>
> • If a file has mismatched matching-pairs, the script will display the
> file name, and the  line number and column number of the first
> instance where a mismatched bracket occures. (or, just the char number
> instead (as in emacs's “point”))
>
> • the matching pairs are all single unicode chars. They are these and
> nothing else: () {} [] “” ‹› «» 【】 〈〉 《》 「」 『』
> Note that ‘single curly quote’ is not consider matching pair here.
>
> • You script must be standalone. Must not be using some parser tools.
> But can call lib that's part of standard distribution in your lang.
>
> Here's a example of mismatched bracket: ([)], (“[[”), ((, 】etc. (and
> yes, the brackets may be nested. There are usually text between these
> chars.)
>
> I'll be writing a emacs lisp solution and post in 2 days. Ι welcome
> other lang implementations. In particular, perl, python, php, ruby,
> tcl, lua, Haskell, Ocaml. I'll also be able to eval common lisp
> (clisp) and Scheme lisp (scsh), Java. Other lang such as Clojure,
> Scala, C, C++, or any others, are all welcome, but i won't be able to
> eval it. javascript implementation will be very interesting too, but
> please indicate which and where to install the command line version.
>
> I hope you'll find this a interesting “challenge”. This is a parsing
> problem. I haven't studied parsers except some Wikipedia reading, so
> my solution will probably be naive. I hope to see and learn from your
> solution too.
>
> i hope you'll participate. Just post solution here. Thanks.
>
>  Xah

Parsing technology based on BNF enables an elegant solution.  First
take a basic bracket balancing program which parenthesises the
contents of the input.  e.g. in Shen-YACC

(defcc <br>
   "(" <br> ")" <br$> := [<br> | <br$>];
   <item> <br>;
   <e> := [];)

(defcc <br$>
  <br>;)

(defcc <item>
  -*- := (if (element? -*- ["(" ")"]) (fail) [-*-]);)

Given (compile <br> ["(" 1 2 3 ")" 4]) the program produces [[1 2 3]
4]. When this program is used to parse the input, whatever residue is
left indicates where the parse has failed.  In Shen-YACC

(define tellme
  Stuff -> (let Br (<br> (@p Stuff []))
                Residue (fst Br)
                (if (empty? Residue)
                    (snd Br)
                    (error "parse failure at position ~A~%"
                          (- (length Stuff) (length Residue))))))

e.g.

(tellme ["(" 1 2 3 ")" "(" 4])
parse failure at position 5

(tellme ["(" 1 2 3 ")" "(" ")" 4])
[[1 2 3] [] 4]

The extension of this program to the case described is fairly simple.
Qi-YACC is very similar.

Nice problem.

I do not have further time to correspond right now.

Mark



More information about the Python-list mailing list