What's the best way to parse this HTML tag?

Roy Smith roy at panix.com
Mon Mar 12 14:27:02 CET 2012


In article 
<bb1a55fa-3dcf-4480-ae87-be30a1a65bf7 at h9g2000yqe.googlegroups.com>,
 John Salerno <johnjsal at gmail.com> wrote:

> Well, I had considered exactly that method, but I don't know for sure
> if the titles and names will always have links like that, so I didn't
> want to tie my programming to something so specific. But perhaps it's
> still better than just taking the first two strings.

Such is the nature of screen scraping.  For the most part, web pages are 
not meant to be parsed.  If you decide to go down the road of trying to 
extract data from them, all bets are off.  You look at the markup, take 
your best guess, and go for it.

There's no magic here.  Nobody can look at this HTML and come up with 
some hard and fast rule for how you're supposed to parse it.  And, even 
if they could, it's all likely to change tomorrow when the site rolls 
out their next UI makeover.



More information about the Python-list mailing list