[Tutor] Problem using lxml

Anthony Papillion papillion at gmail.com
Sun Aug 23 01:16:34 CEST 2015


Many thanks, Martin! I had indeed skipped creating the tree object and a
few other things you pointed out. Here is my finished simple code that
actually works:

from lxml import html
import requests

page = requests.get("http://joplin.craigslist.org/search/w4m")
tree = html.fromstring(page.text)
titles = tree.xpath('//a[@class="hdrlnk"]/text()')
try:
    for title in titles:
        print title
except:
    pass

Pretty simple. Thanks for the help!


On Sat, Aug 22, 2015 at 4:20 PM Martin A. Brown <martin at linux-ip.net> wrote:

>
> Hi there Anthony,
>
> > I'm pretty new to lxml but I pretty much thought I'd understood
> > the basics. However, for some reason, my first attempt at using it
> > is failing miserably.
> >
> > Here's the deal:
> >
> > I'm parsing specific page on Craigslist (
> > http://joplin.craigslist.org/search/rea) and trying to retreive the
> text of
> > each link on that page. When I do an "inspect element" in Firefox, a
> sample
> > anchor link looks like this:
> >
> > <a href="/reb/5185592209.html" data-id="5185592209" class="hdrlnk">FIRST
> > OPEN HOUSE TOMORROW 2:00pm-4:00pm!!! (8-23-15)</a>
> >
> > The code I'm using to try to get the link text is this:
> >
> > from lxml import html
> > import requests
> >
> > page = requests.get("http://joplin.craigslist.org/search/rea")
>
> You are missing something here that takes the page.content, parses
> it and creates variable called tree.
>
> > titles = tree.xpath('//a[@title="hdrlnk"]/text()')
>
> And, your xpath is incorrect.  Play with this in the interactive
> browser and you will be able to correct your xpath.  I think you
> will notice from the example anchor link above that the attribute of
> the <a/> HTML elements you want to grab is "class", not "title".
> Therefore:
>
>    titles = tree.xpath('//a[@class="hdrlnk"]/text()')
>
> Is probably closer.
>
> > print titles
> >
> > The last line, where it supposedly will print the text of each anchor
> > returns [].
> >
> > I can't seem to figure out what I'm doing wrong. lmxml seems pretty
> > straightforward but I can't seem to get this down.
>
> Again, I'd recommend playing with the data in an interactive console
> session.  You will be able to figure out exactly which xpath gets
> you the data you would like, and then you can drop it into your
> script.
>
> Good luck,
>
> -Martin
>
> --
> Martin A. Brown
> http://linux-ip.net/
>


More information about the Tutor mailing list