Problem accessing a web page

Tim Chase python.list at
Mon Dec 15 21:55:59 CET 2008

> I'm able to grab the problem webpage via Python just fine, albeit with
> a bit of a delay. So, don't know what your exact problem is, maybe
> your connection?

When you get the second page, are you getting the same content 
back that you get if you do a search in your favorite browser?

Using just

   content = urllib.urlopen(url2).read()
   'Error' in content # True
   'Friedrich' in content # False

However, when you browse to the page, those two should be inverted:

   'Error' in content # False
   'Friedrich' in content # True

I've tried adding in the parameters correctly via post

   params = urllib.urlencode([
     ('params.forzaQuery', 'N'),
     ('layout', 'busquedaisbn'),
   content = urllib.urlopen(url2, data).read()

However, this too fails because the underlying engine expects a 
session ID in the URL.  I finally got it to work with the code below:

   import urllib

   data = [
     ('params.forzaQuery', 'N'),
     ('params.cdispo', 'A'),
     ('params.cisbnExt', '8484031128'),
     ('params.liConceptosExt[0].texto', ''),
     ('params.orderByFormId', '1'),
     ('action', 'Buscar'),
     ('language', 'es'),
     ('prev_layout', 'busquedaisbn'),
     ('layout', 'busquedaisbn'),

   params = urllib.urlencode(data)

   url = 

   fp = urllib.urlopen(url, params)
   content =

but I had to hard-code the jsessionid parameter in the URL.  This 
would have to be determined from the initial call & response of 
the initial URL (the initial URL returns a <FORM> element with 
the URL to POST to, including this magic jsessionid parameter).

Hope this helps nudge you (the OP) in the right direction to get 
what you're looking for.


More information about the Python-list mailing list