Modification of a urllib2 object ?

vincehofmeister at gmail.com vincehofmeister at gmail.com
Fri Oct 10 18:12:31 EDT 2008


On Oct 10, 1:02 pm, George Sakkis <george.sak... at gmail.com> wrote:
> On Oct 10, 2:32 pm, vincehofmeis... at gmail.com wrote:
>
>
>
> > I have several ways to the following problem.
>
> > This is what I have:
>
> > ...
> > import ClientForm
> > import BeautifulSoup from BeautifulSoup
>
> > request = urllib2.Request('http://form.com/)
>
> > self.first_object = urllib2.open(request)
>
> > soup = BeautifulSoup(self.first_object)
>
> > forms = ClienForm.ParseResponse(self.first_object)
>
> > Now, when I do this, forms returns an index errror because no forms
> > are returned, but the BeautifulSoup registers fine.
>
> First off, please copy and paste working code; the above has several
> syntax errors, so it can't raise IndexError (or anything else for that
> matter).
>
>
>
> > Now, when I switch the order to this:
>
> > import ClientForm
> > import BeautifulSoup from BeautifulSoup
>
> > request = urllib2.Request('http://form.com/)
>
> > self.first_object = urllib2.open(request)
>
> > forms = ClienForm.ParseResponse(self.first_object)
>
> > soup = BeautifulSoup(self.first_object)
>
> > Now, the form is returned correctly, but the BeautifulSoup objects
> > returns empty.
>
> > So what I can draw from this is both methods erase the properties of
> > the object,
>
> No, that's not the case. What happens is that the http response object
> returned by urllib2.open() is read by the ClienForm.ParseResponse or
> BeautifulSoup - whatever happens first - and the second call has
> nothing to read.
>
> The easiest solution is to save the request object and call
> urllib2.open twice. Alternatively check if ClientForm has a parse
> method that accepts strings instead of urllib2 requests and then read
> and save the html text explicitly:
>
> >>> text = urllib2.open(request).read()
> >>> soup = BeautifulSoup(text)
> >>> forms = ClientForm.ParseString(text)
>
> HTH,
> George

request = urllib2.Request(settings.register_page)

                self.url_obj = urllib2.urlopen(request).read()

                soup = BeautifulSoup(self.url_obj);

                forms = ClientForm.ParseResponse(self.url_obj,
backwards_compat=False)

                print forms

                images =  HtmlHelper.getCaptchaImages(soup)

                self.webView.setHtml(str(soup))

                #here we generate the popup dialog
                Dialog = QtGui.QDialog()
                ui = captcha_popup.Ui_Dialog()
                ui.setupUi(Dialog, self)
                ui.webView.setHtml(str(images[0]));
                ui.webView_2.setHtml(str(images[1]));
                Dialog.raise_()
                Dialog.activateWindow()
                Dialog.exec_()
                Dialog.show()


Now I am getting this error:

Traceback (most recent call last):
  File "C:\Python25\Lib\site-packages\PyQt4\POS Pounder\Oct7\oct.py",
line 1251, in createAccounts
    forms = ClientForm.ParseResponse(self.url_obj,
backwards_compat=False)
  File "C:\Python25\lib\site-packages\clientform-0.2.9-py2.5.egg
\ClientForm.py", line 1054, in ParseResponse
AttributeError: 'str' object has no attribute 'geturl'



More information about the Python-list mailing list