URLs and ampersands
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Tue Aug 5 19:21:22 EDT 2008
On Tue, 05 Aug 2008 12:07:39 +0000, Duncan Booth wrote:
> Whenever you put a URL into an HTML file you need to escape it, so
> naturally you will also need to unescape it when it is retrieved from
> the file. However, whatever you use to parse the HMTL ought to be
> unescaping text and attributes as part of the parsing process, so you
> shouldn't need a separate function for this.
...
> Even Python's builtin HTMLParser class will do this for you. What parser
> are you using?
A regex.
I know, I know, now I have two problems :-)
It's a quick and dirty hack, not a production piece of code, and I have a
quick and dirty fix by just using url.replace('&', '&').
Thanks to everybody who replied. I guess I really have to bite the bullet
and learn how to use a proper HTML parser.
--
Steven
More information about the Python-list
mailing list