parse html rendered by js

yanghq yanghq at neusoft.com
Sun Feb 13 02:14:35 CET 2011


There seems no Rhino for linux.

Spidermonkey won't support document , window and something else in js,
so it won't help me a lot.

On Sat, 2011-02-12 at 05:57 -0800, john wrote:

> Even though I've never tried it, you may want to look into running the html thru a separate javascript engine, like spidermonkey or rhino, and then parse the results of that.
> 
> On Friday, February 11, 2011 2:20:32 AM UTC-6, yanghq wrote:
> > hi,
> >     I wanna get attribute value like href,src... in html.
> > 
> >     for simple html page libxml2dom can help me parse it into dom, and
> > get what  I want;
> > 
> >     but for some pages rendered by js, like:
> > 
> > document.write(
> > '<frameset border="0" frameborder="no" rows="0,*,0" onLoad="start()"
> > onUnload="end()" onResize="change()">'+
> >   '<frameset border="0" frameborder="no" cols="*,*,*,*,*,0">'+
> > '<frame name="cfgFrame" noresize scrolling="no"
> > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> > '<frame name="mboxFrame" noresize scrolling="no"
> > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> > '<frame name="cmdFrame" noresize scrolling="no"
> > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> > '<frame name="msgFrame" noresize scrolling="no"
> > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> > '<frame name="pabFrame" noresize scrolling="no"
> > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> > '<frame name="cnFrame" noresize scrolling="no" src="../frame.html?' +
> > main.clientargs + '">'+
> >   ''+
> >   '<frame name="mailFrame" marginwidth="0" marginheight="0" noresize
> > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> > '<frame name="appletFrame" marginwidth="0" marginheight="0" noresize
> > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> > ''
> > )
> > how can I get the atrribute value of 'src', thank you for any help.
> > 
> 
> 



---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this communication in error,please 
immediately notify the sender by return e-mail, and delete the original message and all copies from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110213/80df7931/attachment.html>


More information about the Python-list mailing list