Unicode to HTML entities

Clodoaldo clodoaldo.pinto at gmail.com
Wed May 30 08:49:39 EDT 2007


On May 30, 8:53 am, Tommy Nordgren <tommy.nordg... at comhem.se> wrote:
> On 29 maj 2007, at 17.52, Clodoaldo wrote:
>
>
>
> > I was looking for a function to transform a unicode string into
> > htmlentities. Not only the usual html escaping thing but all
> > characters.
>
> > As I didn't find I wrote my own:
>
> > # -*- coding: utf-8 -*-
> > from htmlentitydefs import codepoint2name
>
> > def unicode2htmlentities(u):
>
> >    htmlentities = list()
>
> >    for c in u:
> >       if ord(c) < 128:
> >          htmlentities.append(c)
> >       else:
> >          htmlentities.append('&%s;' % codepoint2name[ord(c)])
>
> >    return ''.join(htmlentities)
>
> > print unicode2htmlentities(u'São Paulo')
>
> > Is there a function like that in one of python builtin modules? If not
> > is there a better way to do it?
>
> > Regards, Clodoaldo Pinto Neto
>
>         In many cases, the need to use html/xhtml entities can be avoided by
> generating
> utf8- coded pages.

Sure. All my pages are utf-8 encoded. The case I'm dealing with is an
email link which subject has non ascii characters like in:

<a href=mailto:example at sample.com?subject=Dúvidas>Mail to</a>

Somehow when the user clicks on the link the subject goes to his email
client with the non ascii chars as garbage.

And before someone points that I should not expose email addresses,
the email is only linked with the consent of the owner and the source
is obfuscated to make it harder for a robot to harvest it.

Regards, Clodoaldo




More information about the Python-list mailing list