Is possible to combine handle_data and regular expressions?
gshepherd281281 at yahoo.com
Thu Jan 19 18:44:20 EST 2006
I've experimented with regular expressions to solve my problems in the
past but I have seen so many comments about HTMLParser and sgmllib that
I thought I would try a different approach this time so I tried using
I want to search through my SGML file for various strings of text and
find out what section they're in. What I have here does this to a
certain extent but I was wondering if I could make handle_data and
regular expressions work together to make this work a little better.
For instance, when I search for "above" as I am here, I just get
something like this: '174.114':'above' but this isn't very useful
b/c I want to know the context of above (i.e., the informaiton on
either side the above) and maybe even us a regular expression to filter
the search a little more.
As always, I'd appreciate feedback on my efforts.
from HTMLParser import HTMLParser
import os, re
root = raw_input("Enter the path where the program should run: ")
fname = raw_input("Enter name of the file: ")
given,ext = os.path.splitext(fname)
inputFile = open(os.path.join(root,fname), 'r')
data = inputFile.read()
_full = None
_secDict = dict()
def handle_starttag(self, tag, attrs):
if tag == "sec-main":
self._main = dict(attrs).get('no')
self._full = self._main
if tag == "sec-sub1":
self._subone = dict(attrs).get('no')
self._full = self._main + '[' + self._subone + ']'
if tag == "sec-sub2":
self._subtwo = dict(attrs).get('no')
self._full = self._main + '[' + self._subone + ']' + '['
+ self._subtwo + ']'
def handle_data(self, data):
if "Pt" in data:
if not self._secDict.has_key(self._main):
self._secDict[self._full] = [data]
if __name__ == "__main__":
parser = PartFinder()
x = parser.found()
output_part = given + '.parts'
outputFile = file(os.path.join(root,output_part), 'w')
More information about the Python-list