URLs and ampersands

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Tue Aug 5 11:59:20 CEST 2008


On Mon, 04 Aug 2008 23:16:46 -0300, Gabriel Genellina wrote:

> En Mon, 04 Aug 2008 20:43:45 -0300, Steven D'Aprano
> <steve at REMOVE-THIS-cybersource.com.au> escribi�:
> 
>> I'm using urllib.urlretrieve() to download HTML pages, and I've hit a
>> snag with URLs containing ampersands:
>>
>> http://www.example.com/parrot.php?x=1&y=2
>>
>> Somewhere in the process, urls like the above are escaped to:
>>
>> http://www.example.com/parrot.php?x=1&amp;y=2
>>
>> which naturally fails to exist.
>>
>> I could just do a string replace, but is there a "right" way to escape
>> and unescape URLs? I've looked through the standard lib, but I can't
>> find anything helpful.
> 
> This works fine for me:
> 
> py> import urllib
> py> fn =
> urllib.urlretrieve("http://c7.amazingcounters.com/counter.php?i=1516903
> &c=4551022")[0]
> py> open(fn,"rb").read()
> '\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00...
> 
> So it's not urlretrieve escaping the url, but something else in your
> code...

I didn't say it urlretrieve was escaping the URL. I actually think the 
URLs are pre-escaped when I scrape them from a HTML file. I have searched 
for, but been unable to find, standard library functions that escapes or 
unescapes URLs. Are there any such functions?



-- 
Steven



More information about the Python-list mailing list