[python-win32] how to use win32com with beautifulsoup or lxml?

elca highcar at gmail.com
Mon Oct 26 01:46:25 CET 2009




Roberto Aguilar wrote:
> 
> On Oct 24, 2009, at 6:17 PM, elca wrote:
>> hello!
>> thanks for your reply
>> for example i want to extract some text in cnn website.
>> such like 'Sponsored links' 'Money' text in cnn website.
>> follow is sample what i want to make script.
>> i want to add function into my script source which can extract such  
>> like
>> text.
>> thanks in advance ! :)
> 
> Unless I'm missing something, why do you need Internet Explorer at  
> all?  You can get the HTML using urllib2:
> 
> import urllib2
> response = urllib2.urlopen('http://cnn.com/')
> html = response.read()
> 
> then extract what you're looking for with beautiful soup:
> 
> from BeautifulSoup import BeautifulSoup
> soup = BeautifulSoup(html)
> 
> for content in soup.findAll('div', class="cnn_sectbincntnt2"):
>      if ' /money?cnn=yes  import win32com.client
>> from time import sleep
>> from win32com.client
>> import Dispatch
>> import urllib,urllib2
>> from BeautifulSoup import BeautifulSoup
>> ie = Dispatch("InternetExplorer.Application")
>> ie.Visible = 1
>> ie.Navigate("http://www.cnn.com")
>> sleep(15)
>> ie.Quit()
>>
>>
>> ccurvey wrote:
>>>
>>> you can definitely use IE to and innerHTML() to get the HTML, then  
>>> use
>>> BeautifulSoup to parse the HTML.  What are you having trouble with?
>>>
>>>
>>>
>>> On Sat, Oct 24, 2009 at 8:34 PM, elca <highcar at gmail.com> wrote:
>>>
>>>>
>>>> hello...
>>>> if anyone know..please help me !
>>>> i really want to know...i was searched in google lot of time.
>>>> but can't found clear soultion. and also because of my lack of  
>>>> python
>>>> knowledge.
>>>> i want to use IE.navigate function with beautifulsoup or lxml..
>>>> if anyone know about this  or sample.
>>>> please help me!
>>>> thanks in advance
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/how-to-use-win32com-with-beautifulsoup-or-lxml--tp26044332p26044332.html
>>>> Sent from the Python - python-win32 mailing list archive at Nabble.com 
>>>> .
>>>>
>>>> _______________________________________________
>>>> python-win32 mailing list
>>>> python-win32 at python.org
>>>> http://mail.python.org/mailman/listinfo/python-win32
>>>>
>>>
>>>
>>>
>>> -- 
>>> The source of your stress might be a moron
>>>
>>> _______________________________________________
>>> python-win32 mailing list
>>> python-win32 at python.org
>>> http://mail.python.org/mailman/listinfo/python-win32
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/how-to-use-win32com-with-beautifulsoup-or-lxml--tp26044332p26044523.html
>> Sent from the Python - python-win32 mailing list archive at  
>> Nabble.com.
>>
>> _______________________________________________
>> python-win32 mailing list
>> python-win32 at python.org
>> http://mail.python.org/mailman/listinfo/python-win32
> 
> _______________________________________________
> python-win32 mailing list
> python-win32 at python.org
> http://mail.python.org/mailman/listinfo/python-win32
> 
> 


Hello,
sorry for late reply..
actually im making web scraper.
and scraping is no problem with javascript.
after made scraper, i will add some other function and that time i will
encounter many javascript, 
so why i try to use PAMIE or IE
http://elca.pastebin.com/m52e7d8e0
i was attached current scraper script source.
especially  i want to change 'thepage = urllib.urlopen(theurl).read()' to
PAMIE method.
if possible ,you can check it and correct me?
thanks in advance..

Regards

-- 
View this message in context: http://www.nabble.com/how-to-use-win32com-with-beautifulsoup-or-lxml--tp26044332p26053433.html
Sent from the Python - python-win32 mailing list archive at Nabble.com.



More information about the python-win32 mailing list