URLs and ampersands
duncan.booth at invalid.invalid
Tue Aug 5 14:07:39 CEST 2008
Steven D'Aprano <steven at REMOVE.THIS.cybersource.com.au> wrote:
> I didn't say it urlretrieve was escaping the URL. I actually think the
> URLs are pre-escaped when I scrape them from a HTML file. I have
> searched for, but been unable to find, standard library functions that
> escapes or unescapes URLs. Are there any such functions?
Whenever you put a URL into an HTML file you need to escape it, so
naturally you will also need to unescape it when it is retrieved from the
file. However, whatever you use to parse the HMTL ought to be unescaping
text and attributes as part of the parsing process, so you shouldn't need a
separate function for this.
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('''<a href="http://www.example.com/parrot.php?x=1
Even Python's builtin HTMLParser class will do this for you. What parser
are you using?
Duncan Booth http://kupuguy.blogspot.com
More information about the Python-list