What's the best way to parse this HTML tag?
Roy Smith
roy at panix.com
Sun Mar 11 20:28:33 EDT 2012
In article
<239c4ad7-ac93-45c5-98d6-71a434e1c5aa at r21g2000yqa.googlegroups.com>,
John Salerno <johnjsal at gmail.com> wrote:
> Getting the time that the song is played is easy, because the time is
> wrapped in a <div> tag all by itself with a class attribute that has a
> specific value I can search for. But the actual song title and artist
> information is harder, because the HTML isn't quite as precise. Here's
> a sample:
>
> <div class="cmPlaylistContent">
> <strong>
> <a href="/lsp/t2995/">
> Love Without End, Amen
> </a>
> </strong>
> <br/>
> <a href="/lsp/a436/">
> George Strait
> </a>
> [...]
> Therefore, I appeal to your greater wisdom in these matters. Given
> this HTML, is there a "best practice" for how to refer to the song
> title and artist?
Obviously, any attempt at screen scraping is fraught with peril.
Beautiful Soup is a great tool but it doesn't negate the fact that
you've made a pact with the devil. That being said, if I had to guess,
here's your puppy:
> <a href="/lsp/t2995/">
> Love Without End, Amen
> </a>
the thing to look for is an "a" element with an href that starts with
"/lsp/t", where "t" is for "track". Likewise:
> <a href="/lsp/a436/">
> George Strait
> </a>
an href starting with "/lsp/a" is probably an artist link.
You owe the Oracle three helpings of tag soup.
More information about the Python-list
mailing list