URLs and ampersands

Wojtek Walczak gminick at nie.ma.takiego.adresu.w.sieci.pl
Tue Aug 5 13:01:12 CEST 2008


Dnia 05 Aug 2008 09:59:20 GMT, Steven D'Aprano napisa³(a):

> I didn't say it urlretrieve was escaping the URL. I actually think the 
> URLs are pre-escaped when I scrape them from a HTML file. I have searched 
> for, but been unable to find, standard library functions that escapes or 
> unescapes URLs. Are there any such functions?

$ cd /usr/lib/python2.5/
$ grep "\&amp\;" *.py
BaseHTTPServer.py:    return html.replace("&", "&amp;").replace("<",
"&lt;").replace(">", "&gt;")
cgi.py:    s = s.replace("&", "&amp;") # Must be done first!
cgitb.py:                doc = doc.replace('&', '&amp;').replace('<',
'&lt;')
difflib.py:
text=text.replace("&","&amp;").replace(">","&gt;").replace("<","&lt;")
HTMLParser.py:        s = s.replace("&amp;", "&") # Must be last
pydoc.py:        return replace(text, '&', '&amp;', '<', '&lt;', '>',
'&gt;')
xmlrpclib.py:    s = replace(s, "&", "&amp;")

So it could be BaseHTTPServer, cgi, cgitb, difflib, HTMLParser,
pydoc or xmlrpclib. Do you use any of these? Or maybe some other
external module?

-- 
Regards,
Wojtek Walczak,
http://www.stud.umk.pl/~wojtekwa/



More information about the Python-list mailing list