Web Scraping - Output File

Jon Clements joncle at googlemail.com
Thu Apr 26 15:32:04 EDT 2012


 <SMac2347 <at> comcast.net> writes:

> 
> Hello,
> 
> I am having some difficulty generating the output I want from web
> scraping. Specifically, the script I wrote, while it runs without any
> errors, is not writing to the output file correctly. It runs, and
> creates the output .txt file; however, the file is blank (ideally it
> should be populated with a list of names).
> 
> I took the base of a program that I had before for a different data
> gathering task, which worked beautifully, and edited it for my
> purposes here. Any insight as to what I might be doing wrote would be
> highly appreciated. Code is included below. Thanks!

I would approach it like this...

import lxml.html

QUERY = '//tr[@bgcolor="#F1F3F4"][td[starts-with(@class, "body_cols")]]'

url = 'http://www.skadden.com/Index.cfm?contentID=44&alphaSearch=A'


tree = lxml.html.parse(url).getroot()
trs = tree.xpath(QUERY)
for tr in trs:
   tds = [el.text_content() for el in tr.iterfind('td')]
   print tds


hth

Jon.




More information about the Python-list mailing list