What's the best way to write this regular expression?

Chris Rebert clp2 at rebertia.com
Tue Mar 6 23:52:10 CET 2012


On Tue, Mar 6, 2012 at 2:43 PM, John Salerno <johnjsal at gmail.com> wrote:
> I sort of have to work with what the website gives me (as you'll see below), but today I encountered an exception to my RE. Let me just give all the specific information first. The point of my script is to go to the specified URL and extract song information from it.
>
> This is my RE:
>
> song_pattern = re.compile(r'([0-9]{1,2}:[0-9]{2} [a|p].m.).*?<a.*?>(.*?)</a>.*?<a.*?>(.*?)</a>', re.DOTALL)

I would advise against using regular expressions to "parse" HTML:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

lxml is a popular choice for parsing HTML in Python: http://lxml.de

Cheers,
Chris



More information about the Python-list mailing list