Re: a little parsing challenge ☺

Xah Lee xahlee at
Thu Jul 21 20:53:26 CEST 2011

On Jul 21, 9:43 am, pyt... at wrote:
> Xah,
> 1. Is the following string considered legal?
> [ { ( ] ) }
> Note: Each type of brace opens and closes in the proper sequence. But
> inter-brace opening and closing does not make sense.


> Or must a closing brace always balance out with the most recent opening
> brace like so?
> [ { ( ) } ]


> 2. If there are multiple unclosed braces at EOF, is the answer you're
> looking for the position of the first open brace that hasn't been closed
> out yet?

well, as many pointed out, i really haven't thought it out well.

originally, i just want to know the position of a un-matched char.

i haven't taken the time to think about what really should be the
desired behavior. For me, the problem started because i wanted to use
the script to check my 5k html files, in particular, classic novels
that involves double curly quotes and french quotes. So, the desired
behavior is one based on the question of what would best for the user
to see in order to correct a bracket mismatch error in a file. (which,
can get quite complex for nested syntax, because, usually, once you
have one missed, it's all hell from there. I think this is similar to
the problem when a compiler/interpreter encounters a bad syntax in
source code, and thus the poplar situation where error code of
computer programs are hard to understand...)

but anyway, just for this exercise, the requirement needn't be
stringent. I still think that at least the reported position should be
a matching char in the file. (and if we presume this, then only my
code works. LOL)

PS this is a warmup problem for writing a HTML tag validator. I looked
high and lo in past years, but just couldn't find a script that does
simple validation in batch. The w3c one is based on SGML, really huge
amount of un-unstandable irregular historical baggage. XML lexical
validator is much closer, but still not regular. I simply wanted one
just like the match-pair validator in our problem, except the opening
char is not a single char but string of the form <xyz …> and the
*matching* closing one is of the form </xyz>, and with just one
exception: when a tag has “/>” in ending such as <br/> then it is
skipped (i.e. not considered as opening or closing).

I'll be writing this soon in elisp… since i haven't studied parsers, i
had hopes that parser expert would show some proper parser solutions…
in particular i think such can be expressed in Parsing Expression
Grammar in just a few lines… but so far no deity came forward to show
the light. lol

getting ranty… it's funny, somehow the tech geekers all want regex to
solve the problem. Regex, regex, regex, a 40 years old deviant bastard
that by some twist of luck became a tool for matching text patterns.
One bloke came forward to show-off a perl regex obfuscation. That's
like, lol. But it might be good for the lulz if his code is actually
complete and worked. Then, you have a few who'd nonchalantly remark
“O, you just need push-down automata”. LOL, unless they show actual
working code, its Automata their asses.

folks, don't get angry with me. I'm a learner. I'm curious. I always
am eager to learn. And there's always things we can learn. Don't get
into a fit and do the troll dance in a pit with me. Nobody's gonna
give a shit if you think u knew it all. If u are not the master of one
thousand and one languages yet, you can learn with me. ☺ troll!!!!


More information about the Python-list mailing list