[Tutor] Problem using lxml
papillion at gmail.com
Sun Aug 23 01:16:34 CEST 2015
Many thanks, Martin! I had indeed skipped creating the tree object and a
few other things you pointed out. Here is my finished simple code that
from lxml import html
page = requests.get("http://joplin.craigslist.org/search/w4m")
tree = html.fromstring(page.text)
titles = tree.xpath('//a[@class="hdrlnk"]/text()')
for title in titles:
Pretty simple. Thanks for the help!
On Sat, Aug 22, 2015 at 4:20 PM Martin A. Brown <martin at linux-ip.net> wrote:
> Hi there Anthony,
> > I'm pretty new to lxml but I pretty much thought I'd understood
> > the basics. However, for some reason, my first attempt at using it
> > is failing miserably.
> > Here's the deal:
> > I'm parsing specific page on Craigslist (
> > http://joplin.craigslist.org/search/rea) and trying to retreive the
> text of
> > each link on that page. When I do an "inspect element" in Firefox, a
> > anchor link looks like this:
> > <a href="/reb/5185592209.html" data-id="5185592209" class="hdrlnk">FIRST
> > OPEN HOUSE TOMORROW 2:00pm-4:00pm!!! (8-23-15)</a>
> > The code I'm using to try to get the link text is this:
> > from lxml import html
> > import requests
> > page = requests.get("http://joplin.craigslist.org/search/rea")
> You are missing something here that takes the page.content, parses
> it and creates variable called tree.
> > titles = tree.xpath('//a[@title="hdrlnk"]/text()')
> And, your xpath is incorrect. Play with this in the interactive
> browser and you will be able to correct your xpath. I think you
> will notice from the example anchor link above that the attribute of
> the <a/> HTML elements you want to grab is "class", not "title".
> titles = tree.xpath('//a[@class="hdrlnk"]/text()')
> Is probably closer.
> > print titles
> > The last line, where it supposedly will print the text of each anchor
> > returns .
> > I can't seem to figure out what I'm doing wrong. lmxml seems pretty
> > straightforward but I can't seem to get this down.
> Again, I'd recommend playing with the data in an interactive console
> session. You will be able to figure out exactly which xpath gets
> you the data you would like, and then you can drop it into your
> Good luck,
> Martin A. Brown
More information about the Tutor