Problem when fetching page using urllib2.urlopen

Piet van Oostrum piet at
Mon Aug 10 21:36:55 CEST 2009

>>>>> jitu <nair.jitendra at> (j) wrote:

>j> Hi,
>j> A html page  contains 'anchor' elements with 'href' attribute  having
>j> a semicolon  in the url , while fetching the page using
>j> urllib2.urlopen, all such href's  containing  'semicolons' are
>j> truncated.

>j> For example the href;_ylt=AlWSqpkpqhICp1lMgChtJkCdGWoL
>j> get truncated to

>j> The page I am talking about can be fetched from

It's not python that causes this. It is the server that sends you the
URLs without these parameters (that's what they are).

To get them you have to tell the server that you are a respectable
browser. E.g.

import urllib2

url = ';_ylt=AlWSqpkpqhICp1lMgChtJkCdGWoL'

url = ';_ylc=X3oDMTFka28zOGNuBF9TAzI3NjY2NzkEX3MDOTY5NTUzMjUEc2VjA3NzcC1kZXN0BHNsawN0aXRsZQ--'

hdrs = {'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv: Gecko/2009073021 Firefox/3.0.13',
       'Accept': 'image/*'}

request = urllib2.Request(url = url, headers = hdrs)
page = urllib2.urlopen(request).read()

Piet van Oostrum <piet at>
URL: [PGP 8DAE142BE17999C4]
Private email: piet at

More information about the Python-list mailing list