What's the best way to parse this HTML tag?
roy at panix.com
Mon Mar 12 14:27:02 CET 2012
<bb1a55fa-3dcf-4480-ae87-be30a1a65bf7 at h9g2000yqe.googlegroups.com>,
John Salerno <johnjsal at gmail.com> wrote:
> Well, I had considered exactly that method, but I don't know for sure
> if the titles and names will always have links like that, so I didn't
> want to tie my programming to something so specific. But perhaps it's
> still better than just taking the first two strings.
Such is the nature of screen scraping. For the most part, web pages are
not meant to be parsed. If you decide to go down the road of trying to
extract data from them, all bets are off. You look at the markup, take
your best guess, and go for it.
There's no magic here. Nobody can look at this HTML and come up with
some hard and fast rule for how you're supposed to parse it. And, even
if they could, it's all likely to change tomorrow when the site rolls
out their next UI makeover.
More information about the Python-list