
It consists of a pre-filter, http://users.commspeed.net/tbabbitt/tom2/HTMLparse.py usage: import HTMLparse,urllib2 TomFilter = HTMLparse.TomFilter() webtext = urllib2.urlopen('http://www.yahoo.com') formtxt = TomFilter.formfilter(webtext,'http://www.yahoo.com') # after feeding imagelist = TomFilter.imagelist linklist = TomFilter.linklist scripttext = TomFilter.scripttext linktextlist = TomFilter.linktextlist background = TomFilter.background backgroundcolor = TomFilter.backgroundcolor The parser, completes URL's, changes form tags to custom wxControl tags and mangles frames into rows of tables. (note encoding('utf_8','ignore'). The next part is a custom module for web controls. http://users.commspeed.net/tbabbitt/tom2/webwig.py The controls post there events to a custom event and are available as a class called WebEvent. (Note Radio buttons are implemented as check boxes so you have to uncheck at the event end.) The only thing left to be done is to link the events to the program running the wx.html window. import wx import wx.html as html import wx.lib.wxpTag import webwig def __init__(self): ....self.Bind(webwig.EVT_UPDATE_WEBFORM,self.OnPyEvent) ....self.WebObjects = {} def OnPyEvent(self, event): ....self.WebEvent = event.GetVal() ....if self.WebEvent.Typ[:4] == 'int_': ........self.WebObjects[self.WebEvent.Id] = self.WebEvent ........self.WebObjects[self.WebEvent.Id].Typ = self.WebObjects[self.WebEvent.Id].Typ[4:] ....else: ........self.WebObjects[self.WebEvent.Id] = self.WebEvent This gives you a WebEvent object and dictionary of all web objects on page. WebEvent objects have the following properties, self.WebEvent.Name # the control name self.WebEvent.Value # the control value self.WebEvent.WebValue # in the form u'&%s="%s"' % (Name ,Value ) self.WebEvent.URL # the full URL self.WebEvent.FormData # the full Form data in dictionary self.WebEvent.Data # extra data from select and data from hidden, in form 'action'|'http://'|'method'|'POST' ect. self.WebEvent.Id #the controll Id self.WebEvent.Win # the control itself Here is code for the Demo (love the new demo). http://users.commspeed.net/tbabbitt/tom2/demowebwig.py I have also written a program to display its features it requires, PIL , http://www.pythonware.com/products/pil/ image_view (ScrolledWindow PIL viewer) http://users.commspeed.net/tbabbitt/tom/image_view.py the Browser http://users.commspeed.net/tbabbitt/tom2/TomBrowse.py Inspired by the demo it separates the web page components into notebook pages and runs the Content of a STC through the above event. Errors are displayed on a log page. I know how much people have wanted something like this so I just want to get the code out there. The tag handler complains on pages with hidden tags but I can find nothing wrong with my tag syntax. There is an unhanded exception when I use the SetPage from the event definition but when the page is saved it loads back ok so I probably need to destroy the web objects on the event end. Infinite Abundance, Tom Babbitt
participants (1)
-
Tom B.