Re: a little parsing challenge ☺

Xah Lee xahlee at gmail.com
Thu Jul 21 14:58:48 CEST 2011


On Jul 19, 11:07 am, Thomas Jollans <t... at jollybox.de> wrote:
> On 19/07/11 18:54, Xah Lee wrote:
>
>
>
>
>
>
>
>
>
> > On Sunday, July 17, 2011 2:48:42 AM UTC-7, Raymond Hettinger wrote:
> >> On Jul 17, 12:47 am, Xah Lee <xah... at gmail.com> wrote:
> >>> i hope you'll participate. Just post solution here. Thanks.
>
> >>http://pastebin.com/7hU20NNL
>
> > just installed py3.
> > there seems to be a bug.
> > in this file
>
> >http://xahlee.org/p/time_machine/tm-ch04.html
>
> > there's a mismatched double curly quote. at position 28319.
>
> > the python code above doesn't seem to spot it?
>
> > here's the elisp script output when run on that dir:
>
> > Error file: c:/Users/h3/web/xahlee_org/p/time_machine/tm-ch04.html
> > ["“" 28319]
> > Done deal!
>
> That script doesn't check that the balance is zero at the end of file.
>
> Patch:
>
> --- ../xah-raymond-old.py       2011-07-19 20:05:13.000000000 +0200
> +++ ../xah-raymond.py   2011-07-19 20:03:14.000000000 +0200
> @@ -16,6 +16,8 @@
>          elif c in closers:
>              if not stack or c != stack.pop():
>                  return i
> +    if stack:
> +        return i
>      return -1
>
>  def scan(directory, encoding='utf-8'):

Thanks a lot for the fix Raymond.

Though, the code seems to have a minor problem.
It works, but the report is wrong.
e.g. output:

30068: c:/Users/h3/web/xahlee_org/p/time_machine\tm-ch04.html

that 30068 position is the last char in the file.
The correct should be 28319. (or at least point somewhere in the file
at a bracket char that doesn't match.)

Today, i tried 3 more scripts. 2 fixed python3 versions, 1 ruby, all
failed again. I've reported the problems i encounter at python or ruby
newsgroups. If you are the author, a fix is very much appreciated.
I'll get back to your code and eventually do a blog of summary of all
different lang versions.

Am off to test that elaborate perl regex now... cross fingers.

 Xah. Mood: quite discouraged.



More information about the Python-list mailing list