URLs and ampersands

Wojtek Walczak gminick at nie.ma.takiego.adresu.w.sieci.pl
Tue Aug 5 13:01:12 CEST 2008

Dnia 05 Aug 2008 09:59:20 GMT, Steven D'Aprano napisa³(a):

> I didn't say it urlretrieve was escaping the URL. I actually think the 
> URLs are pre-escaped when I scrape them from a HTML file. I have searched 
> for, but been unable to find, standard library functions that escapes or 
> unescapes URLs. Are there any such functions?

$ cd /usr/lib/python2.5/
$ grep "\&amp\;" *.py
BaseHTTPServer.py:    return html.replace("&", "&").replace("<",
"<").replace(">", ">")
cgi.py:    s = s.replace("&", "&") # Must be done first!
cgitb.py:                doc = doc.replace('&', '&').replace('<',
HTMLParser.py:        s = s.replace("&", "&") # Must be last
pydoc.py:        return replace(text, '&', '&', '<', '<', '>',
xmlrpclib.py:    s = replace(s, "&", "&")

So it could be BaseHTTPServer, cgi, cgitb, difflib, HTMLParser,
pydoc or xmlrpclib. Do you use any of these? Or maybe some other
external module?

Wojtek Walczak,

More information about the Python-list mailing list