searching backwards in a string
Steve Holden
sholden at holdenweb.com
Wed Feb 13 07:42:11 EST 2002
"Paul Rubin" <phr-n2002a at nightsong.com> wrote in ...
> "Steve Holden" <sholden at holdenweb.com> writes:
> > Paul, this thread's probably now old enough for you to tell us what the
real
> > problem is! Why exactly do you need to search backwards from the
50,000th
> > character to find the beginning of an HTML tag?
>
> Suppose I'm parsing the file and I see a </table> tag and I want to
> find the matching <table> tag. It could be pretty far back in the file.
> That's what I was doing when I encountered this question. But searching
> backwards is a normal thing to want to do in general--for example it's
> a standard command in any decent text editor.
>
> Anyway, I just entered a sourceforge bug about it being missing from
> Python's re module.
>
I'd be very surprised if this meets with any response other than "this is
not a bug".
Frankly, if you see a </table> tag and you have no idea where the matching
<table> tag appears then whatever you are doing to the HTML file you
certainly aren't parsing it!
Don't know whether this will help: it's an example from "Python Web
Programming" that shows you how to extract the table structure from an HTML
file.
import htmllib, urllib, formatter, sys
def Usage():
print """
Usage: python showtbls.py URL
"""
class myHTMLParser(htmllib.HTMLParser):
def __init__(self, f):
htmllib.HTMLParser.__init__(self, f)
self.tblindent = 0
def start_table(self, attrs):
sys.stdout.write("%s<table" % (" " * self.tblindent, ))
for k, v in attrs:
if k in ("width", "cellspacing"):
sys.stdout.write(' %s="%s"' % (k, v),)
print ">"
self.tblindent += 1
def end_table(self):
self.tblindent -= 1
print "%s</table>" % (" " * self.tblindent, )
def parse(url, formatter):
f = urllib.urlopen(url)
data = f.read()
f.close()
p = myHTMLParser(formatter)
p.feed(data)
p.close()
if len(sys.argv) != 2:
Usage()
else:
fmt = formatter.NullFormatter()
parse(sys.argv[1], fmt)
regards
Steve
--
Consulting, training, speaking: http://www.holdenweb.com/
Author, Python Web Programming: http://pydish.holdenweb.com/pwp/
"This is Python. We don't care much about theory, except where it
intersects with useful practice." Aahz Maruch on c.l.py
More information about the Python-list
mailing list