HTML Code - Line Number

Jon Clements joncle at googlemail.com
Sat Apr 28 02:45:57 EDT 2012


On Friday, 27 April 2012 18:09:57 UTC+1, SMac... at comcast.net  wrote:
> Hello,
> 
> For scrapping purposes, I am having a bit of trouble writing a block
> of code to define, and find, the relative position (line number) of a
> string of HTML code. I can pull out one string that I want, and then
> there is always a line of code, directly beneath the one I can pull
> out, that begins with the following:
> <td align="left" valign="top" class="body_cols_middle">
> 
> However, because this string of HTML code above is not unique to just
> the information I need (which I cannot currently pull out), I was
> hoping there is a way to effectively say "if you find the html string
> _____ in the line of HTML code above, and the string <td align="left"
> valign="top" class="body_co <SMac2347 <at> comcast.net> writes:

> 
> Hello,
> 
> I am having some difficulty generating the output I want from web
> scraping. Specifically, the script I wrote, while it runs without any
> errors, is not writing to the output file correctly. It runs, and
> creates the output .txt file; however, the file is blank (ideally it
> should be populated with a list of names).
> 
> I took the base of a program that I had before for a different data
> gathering task, which worked beautifully, and edited it for my
> purposes here. Any insight as to what I might be doing wrote would be
> highly appreciated. Code is included below. Thanks!

[quoting reply to first thread]
I would approach it like this...

import lxml.html

QUERY = '//tr[@bgcolor="#F1F3F4"][td[starts-with(@class, "body_cols")]]'

url = 'http://www.skadden.com/Index.cfm?contentID=44&alphaSearch=A'


tree = lxml.html.parse(url).getroot()
trs = tree.xpath(QUERY)
for tr in trs:
   tds = [el.text_content() for el in tr.iterfind('td')]
   print tds


hth

Jon.
[/quote]





> following, then pull everything that follows this second string.
> 
> Any thoughts as to how to define a function to do this, or do this
> some other way? All insight is much appreciated! Thanks.

 <SMac2347 <at> comcast.net> writes:

> 
> Hello,
> 
[snip]
> Any thoughts as to how to define a function to do this, or do this
> some other way? All insight is much appreciated! Thanks.
> 

[quote in reply to second thread]
Did you not see my reply to your previous thread?

And why do you want the line number?
[/quote]

I'm trying this on GG, as the mailing list gateway one or t'other does nee seem to work (mea culpa no doubt).

So may have obscured the issue more with my quoting and snipping, or what not.

Jon.












More information about the Python-list mailing list