how can i use lxml with win32com?

elca highcar at gmail.com
Sun Oct 25 04:22:49 EDT 2009


Hello,
thanks for your reply.
actually what i want to parse website is some different language site.
so i was quote some common english website for easy understand.   :)
by the way, is it possible to use with PAMIE and beautifulsoup work
together?
Thanks a lot



motoom wrote:
> 
> elca wrote:
> 
>> yes i want to extract this text 'CNN Shop' and linked page
>> 'http://www.turnerstoreonline.com'.
> 
> Well then.
> First, we'll get the page using urrlib2:
> 
>      doc=urllib2.urlopen("http://www.cnn.com")
> 
> Then we'll feed it into the HTML parser:
> 
>      soup=BeautifulSoup(doc)
> 
> Next, we'll look at all the links in the page:
> 
>      for a in soup.findAll("a"):
> 
> and when a link has the text 'CNN Shop', we have a hit,
> and print the URL:
> 
>          if a.renderContents()=="CNN Shop":
>              print a["href"]
> 
> 
> The complete program is thus:
> 
> import urllib2
> from BeautifulSoup import BeautifulSoup
> 
> doc=urllib2.urlopen("http://www.cnn.com")
> soup=BeautifulSoup(doc)
> for a in soup.findAll("a"):
>      if a.renderContents()=="CNN Shop":
>          print a["href"]
> 
> 
> The example above can be condensed because BeautifulSoup's find function 
> can also look for texts:
> 
>      print soup.find("a",text="CNN Shop")
> 
> and since that's a navigable string, we can ascend to its parent and 
> display the href attribute:
> 
>      print soup.find("a",text="CNN Shop").findParent()["href"]
> 
> So eventually the whole program could be collapsed into one line:
> 
>      print 
> BeautifulSoup(urllib2.urlopen("http://www.cnn.com")).find("a",text="CNN 
> Shop").findParent()["href"]
> 
> ...but I think this is very ugly!
> 
> 
>  > im very sorry my english.
> 
> You English is quite understandable.  The hard part is figuring out what 
> exactly you wanted to achieve ;-)
> 
> I have a question too.  Why did you think JavaScript was necessary to 
> arrive at this result?
> 
> Greetings,
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26045979.html
Sent from the Python - python-list mailing list archive at Nabble.com.




More information about the Python-list mailing list