Newbie ? -- SGML metadata extraction

Adonis adonisv at
Mon Jan 16 19:01:44 EST 2006

ProvoWallis wrote:


 From what I gather here is a quickie, probably better solutions on the 
way but this accomplishes the idea I think.

Some helpful links:


from HTMLParser import HTMLParser

data = """<main-section no="1">

<form id="graphic_1.tif">
<form id="graphic_2.tif">

<main-section no="2">

<form id="graphic_3.tif">

<main-section no="3">

<form id="graphic_4.tif">
<form id="graphic_5.tif">
<form id="graphic_6.tif">

class ParseForms(HTMLParser):

     def handle_starttag(self, tag, attrs):
         if tag == "form":
             # attrs argument is a list of tuples [(attribute, value)]
             # converted it to a dictionary to access attribute easier
             print "form id: %s" % dict(attrs).get('id')

if __name__ == "__main__":
     parser = ParseForms()

More information about the Python-list mailing list