Problem when fetching page using urllib2.urlopen

jitu nair.jitendra at
Tue Aug 11 07:15:31 CEST 2009

Yes Piet you were right this works. But seems does not work on google
app engine, since  it appends it own agent info as seen below

'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US;
rv: Gecko/2009073021 Firefox/3.0.13 AppEngine-Google;

Any way Thanks . Good to know about the User-Agent field.


On Aug 11, 12:36 am, Piet van Oostrum <p... at> wrote:
> >>>>> jitu <nair.jiten... at> (j) wrote:
> >j> Hi,
> >j> A html page  contains 'anchor' elements with 'href' attribute  having
> >j> a semicolon  in the url , while fetching the page using
> >j> urllib2.urlopen, all such href's  containing  'semicolons' are
> >j> truncated.
> >j> For example the href;_ylt...
> >j> get truncated to
> >j> The page I am talking about can be fetched from
> >j>;_...
> It's not python that causes this. It is the server that sends you the
> URLs without these parameters (that's what they are).
> To get them you have to tell the server that you are a respectable
> browser. E.g.
> import urllib2
> url = ';_ylt...
> url = ';_...
> hdrs = {'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv: Gecko/2009073021 Firefox/3.0.13',
>        'Accept': 'image/*'}
> request = urllib2.Request(url = url, headers = hdrs)
> page = urllib2.urlopen(request).read()
> --
> Piet van Oostrum <p... at>
> URL:[PGP 8DAE142BE17999C4]
> Private email: p... at

More information about the Python-list mailing list