[Python-ideas] Add "htmlcharrefreplace" error handler
Serhiy Storchaka
storchaka at gmail.com
Fri Jun 14 17:37:10 CEST 2013
14.06.13 18:20, Steven D'Aprano написав(ла):
> On 14/06/13 19:22, Antoine Pitrou wrote:
>> It's not trivial, it's additional C code in an important part of the
>> language (unicode and codecs).
>
> Or, it's 17 lines of Python. Something like this is a good start:
>
>
> import codecs
> from html.entities import codepoint2name
>
> def htmlcharrefreplace_errors(exc):
> c = exc.object[exc.start]
> try:
> entity = codepoint2name[ord(c)]
> except KeyError:
> n = ord(c)
> if n <= 0xFFFF:
> replace = "\\u%04x"
> else:
> replace = "\\U%08x"
> replace = replace % n
Actually '&#%d;' % n. See also my sample implementation in original post
which reuses xmlcharrefreplace_errors.
More information about the Python-ideas
mailing list