searching backwards in a string

Wed Feb 13 04:14:06 EST 2002

Paul Rubin <phr-n2002a at nightsong.com> wrote in message news:<7x1yfq2gdf.fsf at ruckus.brouhaha.com>...
> "Steve Holden" <sholden at holdenweb.com> writes:
> > Paul, this thread's probably now old enough for you to tell us what the real
> > problem is! Why exactly do you need to search backwards from the 50,000th
> > character to find the beginning of an HTML tag?
> 
> Suppose I'm parsing the file and I see a </table> tag and I want to
> find the matching <table> tag.  It could be pretty far back in the file.
> That's what I was doing when I encountered this question.

Paul, I really admire your energy, writing your own HTML parser. I'm a
lazy old so-and-so, if I had the slightest interest in HTML I'd be
looking at the HTMLParser and htmllib modules. What was there about
them that didn't suit your purpose?

... and if I had the energy to write an HTML parser I probably would
have used some dumb old technique like reading the file forwards and
if I met <foo> I'd put it in a stack (or some other data structure) of
"unclosed" tags (together with the position in the file where I found
it) and when I met the corresponding </foo> I'd take the appropriate
"foo" action (which would probably involve the use of the data that
I'd found between <foo> and </foo>)and then rip "foo" out of the
pending bag ... using a regex backwards to find the opening tag is so
innovative that I'm totally gobsmacked.

> But searching
> backwards is a normal thing to want to do in general--for example it's
> a standard command in any decent text editor.

... and in many indecent text editors. However AFAIK the
implementation is to go back a line at a time and do a forward regex
search in each line.

> 
> Anyway, I just entered a sourceforge bug about it being missing from
> Python's re module.

Looks like the effbot gets to be gobsmacked too.

> Thanks

No, Paul, thank *you* -- you've made my day.