What's the best way to write this regular expression?
Chris Rebert
clp2 at rebertia.com
Tue Mar 6 17:52:10 EST 2012
On Tue, Mar 6, 2012 at 2:43 PM, John Salerno <johnjsal at gmail.com> wrote:
> I sort of have to work with what the website gives me (as you'll see below), but today I encountered an exception to my RE. Let me just give all the specific information first. The point of my script is to go to the specified URL and extract song information from it.
>
> This is my RE:
>
> song_pattern = re.compile(r'([0-9]{1,2}:[0-9]{2} [a|p].m.).*?<a.*?>(.*?)</a>.*?<a.*?>(.*?)</a>', re.DOTALL)
I would advise against using regular expressions to "parse" HTML:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
lxml is a popular choice for parsing HTML in Python: http://lxml.de
Cheers,
Chris
More information about the Python-list
mailing list