Trying to parse (lxml, SGMLParser, urlparse)

Jerry Rocteur jerry.rocteur at
Sun Jan 18 13:07:37 CET 2015


I'm trying to parse

The body source I'm interested in contains blocks exactly like this

<tr class='friend'>
<td class='text--left'>
<a href="/players/mizucci0"><img alt="mizucci0" class="media__avatar"
<div class='friend__info'>
<td class='delta-alt'>
<td class='delta-alt'>
<td class='delta-alt'>

I wanted to do it Python as I'm learning and I looked at the different
modules but it isn't easy for me to work out the best way to do this
as most tutorials I see use complicated classes and I just want to
parse this one paragraph at a time (as I would do in Perl) and print

1 mizuho 26648 35315
2 xxxxxx  99999 99999
3 xxxxxx 99999 99999

etc. (in the above case I'm ignoring 818.7 and Miles.

The best way I found so far is this:

from lxml import html
import requests
page = requests.get("")
tree = html.fromstring(page.text)
a = tree.xpath('//span/text()')
b = tree.xpath('//td/text()')

And the manipulating indices

print "%s %s %s %s" % (a[usern], a[users], b[tots], b[weekb])
    tots += 4
    weekb += 4
    usern += 2
    users += 2

But it isn't very scientific ;-)

Which module would you use and how would you suggest is the best way to do it ?

Thanks very much in advance, I haven't done a lot of HTML parsing.. I
would much prefer using WebServices and an API but unfortunately they
don't have it.
Jerry Rocteur

More information about the Python-list mailing list