Mailman 3 Extracting titles from links - lxml - The Python XML Toolkit

Dec. 24, 2014

      I want to be able to grab the title of articles from a webpage. I wrote my
script using the following XPath

*en_tree.xpath('//a[@class="pubSectionTitle"]')*

To grab from the following example XML:

*<a href='/en/publications/**magazines/wp20141201/ancient-*

*city-timgad/' class="pubSectionTitle" title="Timgad—A Buried City Reveals
Its Secrets">                         Timgad<wbr />—A Buried City Reveals
Its Secrets                      </a> *
When I encounter the above example, and continue with my script (See the
code below) I only get 'Timgad', not the entire title.

Thanks for any help, I'm very inexperienced with this!

*en_toc = en_tree.xpath('//a[@class="*

*pubSectionTitle"]') for title in chs_toc:         entry =
title.text.strip()         en_titles.append(entry)*

Extracting titles from links

Jason Williams

Burak Arslan

tags

participants (2)