parse html rendered by js

yanghq yanghq at neusoft.com
Fri Feb 11 22:01:10 EST 2011


thank u for your reply.

yeah, my end goal is something like screen scraping a web site.

Duplicating the Javascript behaviour in my Python code will be a huge
burden,I'm afraid time can't aford it.

someone say that webkit / pamie and other browser engine can render js
to html,but pamie is only work for windows, and webkit is hard to
retrieve for me.I am searching other browser engine.

On Fri, 2011-02-11 at 06:46 -0800, Alex Willmer wrote:
> On Feb 11, 8:20 am, yanghq <yan... at neusoft.com> wrote:
> > hi,
> >     I wanna get attribute value like href,src... in html.
> >
> >     for simple html page libxml2dom can help me parse it into dom, and
> > get what  I want;
> >
> >     but for some pages rendered by js, like:
> >
> > document.write(
> > '<frameset border="0" frameborder="no" rows="0,*,0" onLoad="start()"
> > onUnload="end()" onResize="change()">'+
> >   '<frameset border="0" frameborder="no" cols="*,*,*,*,*,0">'+
> > '<frame name="cfgFrame" noresize scrolling="no"
> > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> > ...
> > )
> > how can I get the atrribute value of 'src', thank you for any help.
> 
> You can
> - Duplicate the Javascript behaviour in your Python code (i.e. rewrite
> it yourself). Whenever the javascript changes you will need to update
> your duplicate code.
> - Use or write a Python module that uses a web browser to download/
> execute the page. I'm not aware of any that exist.
> 
> Neither option is very good, and that is one reason why such
> Javascript is considered bad practise. What is your end goal - e.g.
> testing a web application, screen scraping a web site)? There may be a
> better way.
> 
> Alex



---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this communication in error,please 
immediately notify the sender by return e-mail, and delete the original message and all copies from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------


More information about the Python-list mailing list