[Python-ideas] Add "htmlcharrefreplace" error handler

Serhiy Storchaka storchaka at gmail.com
Fri Jun 14 17:37:10 CEST 2013


14.06.13 18:20, Steven D'Aprano написав(ла):
> On 14/06/13 19:22, Antoine Pitrou wrote:
>> It's not trivial, it's additional C code in an important part of the
>> language (unicode and codecs).
>
> Or, it's 17 lines of Python. Something like this is a good start:
>
>
> import codecs
> from html.entities import codepoint2name
>
> def htmlcharrefreplace_errors(exc):
>      c = exc.object[exc.start]
>      try:
>          entity = codepoint2name[ord(c)]
>      except KeyError:
>          n = ord(c)
>          if n <= 0xFFFF:
>              replace = "\\u%04x"
>          else:
>              replace = "\\U%08x"
>          replace = replace % n

Actually '&#%d;' % n. See also my sample implementation in original post 
which reuses xmlcharrefreplace_errors.




More information about the Python-ideas mailing list