[Python-ideas] Add "htmlcharrefreplace" error handler

Ethan Furman ethan at stoneleaf.us
Tue Jun 11 18:18:23 CEST 2013


On 06/11/2013 07:49 AM, Serhiy Storchaka wrote:
> I propose to add "htmlcharrefreplace" error handler which is similar to "xmlcharrefreplace" error handler but use html
> entity names if possible.
>
>>>> '∀ x∈ℜ'.encode('ascii', 'xmlcharrefreplace')
> b'∀ x∈ℜ'
>>>> '∀ x∈ℜ'.encode('ascii', 'htmlcharrefreplace')
> b'∀ x∈ℜ'
>
> Possible implementation:
>
> import codecs
> from html.entities import codepoint2name
>
> def htmlcharrefreplace_errors(exc):
>      if not isinstance(exc, UnicodeEncodeError):
>          raise exc
>      try:
>          replace = r'&%s;' % codepoint2name[ord(exc.object[exc.start])]
>      except KeyError:
>          return codecs.xmlcharrefreplace_errors(exc)
>      return replace, exc.start + 1
>
> codecs.register_error('htmlcharrefreplace', htmlcharrefreplace_errors)
>
> Even if do not register this handler from the start, it may be worth to provide htmlcharrefreplace_errors() in the html
> or html.entities module.

+1 for the idea and the name of 'htmlcharrefreplace'.

--
~Ethan~


More information about the Python-ideas mailing list