Parsing an HTML a tag

beza1e1 andreas.zwinkau at
Sat Sep 24 20:03:53 CEST 2005

I do not really know, what you want to do. Getting he urls from the a
tags of a html file? I think the easiest method would be a regular

>>>import urllib, sre
>>>html = urllib.urlopen("").read()
>>>sre.findall('href="([^>]+)"', html)
>>> sre.findall('href=[^>]+>([^<]+)</a>', html)
['Bilder', 'Groups', 'Verzeichnis', 'News', 'Froogle',
'Mehr&nbsp;&raquo;', 'Erweiterte Suche', 'Einstellungen',
'Sprachtools', 'Werbung', 'Unternehmensangebote', 'Alles \xfcber
Google', ' in English']

Google has some strange html, href without quotation marks: <a
href=> in English</a>

More information about the Python-list mailing list