how can i use lxml with win32com?
elca
highcar at gmail.com
Mon Oct 26 02:57:45 EDT 2009
motoom wrote:
>
> elca wrote:
>
>> http://news.search.naver.com/search.naver?sm=tab_hty&where=news&query=korea+times&x=0&y=0
>> that is korea portal site and i was search keyword using 'korea times'
>> and i want to scrap resulted to text name with 'blogscrap_save.txt'
>
> Aha, now we're getting somewhere.
>
> Getting and parsing that page is no problem, and doesn't need JavaScript
> or Internet Explorer.
>
> import urllib2
> import BeautifulSoup
> doc=urllib2.urlopen("http://news.search.naver.com/search.naver?sm=tab_hty&where=news&query=korea+times&x=0&y=0")
> soup=BeautifulSoup.BeautifulSoup(doc)
>
>
> By analyzing the structure of that page you can see that the articles
> are presented in an unordered list which has class "type01". The
> interesting bit in each list item is encapsulated in a <dd> tag with
> class "sh_news_passage". So, to parse the articles:
>
> ul=soup.find("ul","type01")
> for li in ul.findAll("li"):
> dd=li.find("dd","sh_news_passage")
> print dd.renderContents()
> print
>
> This example prints them, but you could also save them to a file (or a
> database, whatever).
>
> Greetings,
>
>
>
> --
> "The ability of the OSS process to collect and harness
> the collective IQ of thousands of individuals across
> the Internet is simply amazing." - Vinod Valloppillil
> http://www.catb.org/~esr/halloween/halloween4.html
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
Hi, thanks for your help..
thread is too long, so i will open another new post.
thanks a lot
Paul
--
View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26055191.html
Sent from the Python - python-list mailing list archive at Nabble.com.
More information about the Python-list
mailing list