URLs and ampersands

Wojtek Walczak gminick at nie.ma.takiego.adresu.w.sieci.pl
Tue Aug 5 07:01:12 EDT 2008


Dnia 05 Aug 2008 09:59:20 GMT, Steven D'Aprano napisa³(a):

> I didn't say it urlretrieve was escaping the URL. I actually think the 
> URLs are pre-escaped when I scrape them from a HTML file. I have searched 
> for, but been unable to find, standard library functions that escapes or 
> unescapes URLs. Are there any such functions?

$ cd /usr/lib/python2.5/
$ grep "\&amp\;" *.py
BaseHTTPServer.py:    return html.replace("&", "&").replace("<",
"<").replace(">", ">")
cgi.py:    s = s.replace("&", "&") # Must be done first!
cgitb.py:                doc = doc.replace('&', '&').replace('<',
'<')
difflib.py:
text=text.replace("&","&").replace(">",">").replace("<","<")
HTMLParser.py:        s = s.replace("&", "&") # Must be last
pydoc.py:        return replace(text, '&', '&', '<', '<', '>',
'>')
xmlrpclib.py:    s = replace(s, "&", "&")

So it could be BaseHTTPServer, cgi, cgitb, difflib, HTMLParser,
pydoc or xmlrpclib. Do you use any of these? Or maybe some other
external module?

-- 
Regards,
Wojtek Walczak,
http://www.stud.umk.pl/~wojtekwa/



More information about the Python-list mailing list