URLs and ampersands
steven at REMOVE.THIS.cybersource.com.au
Tue Aug 5 11:59:20 CEST 2008
On Mon, 04 Aug 2008 23:16:46 -0300, Gabriel Genellina wrote:
> En Mon, 04 Aug 2008 20:43:45 -0300, Steven D'Aprano
> <steve at REMOVE-THIS-cybersource.com.au> escribi�:
>> I'm using urllib.urlretrieve() to download HTML pages, and I've hit a
>> snag with URLs containing ampersands:
>> Somewhere in the process, urls like the above are escaped to:
>> which naturally fails to exist.
>> I could just do a string replace, but is there a "right" way to escape
>> and unescape URLs? I've looked through the standard lib, but I can't
>> find anything helpful.
> This works fine for me:
> py> import urllib
> py> fn =
> py> open(fn,"rb").read()
> So it's not urlretrieve escaping the url, but something else in your
I didn't say it urlretrieve was escaping the URL. I actually think the
URLs are pre-escaped when I scrape them from a HTML file. I have searched
for, but been unable to find, standard library functions that escapes or
unescapes URLs. Are there any such functions?
More information about the Python-list