how can i use lxml with win32com?
elca
highcar at gmail.com
Sun Oct 25 04:22:49 EDT 2009
Hello,
thanks for your reply.
actually what i want to parse website is some different language site.
so i was quote some common english website for easy understand. :)
by the way, is it possible to use with PAMIE and beautifulsoup work
together?
Thanks a lot
motoom wrote:
>
> elca wrote:
>
>> yes i want to extract this text 'CNN Shop' and linked page
>> 'http://www.turnerstoreonline.com'.
>
> Well then.
> First, we'll get the page using urrlib2:
>
> doc=urllib2.urlopen("http://www.cnn.com")
>
> Then we'll feed it into the HTML parser:
>
> soup=BeautifulSoup(doc)
>
> Next, we'll look at all the links in the page:
>
> for a in soup.findAll("a"):
>
> and when a link has the text 'CNN Shop', we have a hit,
> and print the URL:
>
> if a.renderContents()=="CNN Shop":
> print a["href"]
>
>
> The complete program is thus:
>
> import urllib2
> from BeautifulSoup import BeautifulSoup
>
> doc=urllib2.urlopen("http://www.cnn.com")
> soup=BeautifulSoup(doc)
> for a in soup.findAll("a"):
> if a.renderContents()=="CNN Shop":
> print a["href"]
>
>
> The example above can be condensed because BeautifulSoup's find function
> can also look for texts:
>
> print soup.find("a",text="CNN Shop")
>
> and since that's a navigable string, we can ascend to its parent and
> display the href attribute:
>
> print soup.find("a",text="CNN Shop").findParent()["href"]
>
> So eventually the whole program could be collapsed into one line:
>
> print
> BeautifulSoup(urllib2.urlopen("http://www.cnn.com")).find("a",text="CNN
> Shop").findParent()["href"]
>
> ...but I think this is very ugly!
>
>
> > im very sorry my english.
>
> You English is quite understandable. The hard part is figuring out what
> exactly you wanted to achieve ;-)
>
> I have a question too. Why did you think JavaScript was necessary to
> arrive at this result?
>
> Greetings,
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
--
View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26045979.html
Sent from the Python - python-list mailing list archive at Nabble.com.
More information about the Python-list
mailing list