[Python-ideas] Add "htmlcharrefreplace" error handler
Ethan Furman
ethan at stoneleaf.us
Tue Jun 11 18:18:23 CEST 2013
On 06/11/2013 07:49 AM, Serhiy Storchaka wrote:
> I propose to add "htmlcharrefreplace" error handler which is similar to "xmlcharrefreplace" error handler but use html
> entity names if possible.
>
>>>> '∀ x∈ℜ'.encode('ascii', 'xmlcharrefreplace')
> b'∀ x∈ℜ'
>>>> '∀ x∈ℜ'.encode('ascii', 'htmlcharrefreplace')
> b'∀ x∈ℜ'
>
> Possible implementation:
>
> import codecs
> from html.entities import codepoint2name
>
> def htmlcharrefreplace_errors(exc):
> if not isinstance(exc, UnicodeEncodeError):
> raise exc
> try:
> replace = r'&%s;' % codepoint2name[ord(exc.object[exc.start])]
> except KeyError:
> return codecs.xmlcharrefreplace_errors(exc)
> return replace, exc.start + 1
>
> codecs.register_error('htmlcharrefreplace', htmlcharrefreplace_errors)
>
> Even if do not register this handler from the start, it may be worth to provide htmlcharrefreplace_errors() in the html
> or html.entities module.
+1 for the idea and the name of 'htmlcharrefreplace'.
--
~Ethan~
More information about the Python-ideas
mailing list