On 06/11/2013 07:49 AM, Serhiy Storchaka wrote:
I propose to add "htmlcharrefreplace" error handler which is similar to "xmlcharrefreplace" error handler but use html entity names if possible.
'∀ x∈ℜ'.encode('ascii', 'xmlcharrefreplace') b'∀ x∈ℜ' '∀ x∈ℜ'.encode('ascii', 'htmlcharrefreplace') b'∀ x∈ℜ'
Possible implementation:
import codecs from html.entities import codepoint2name
def htmlcharrefreplace_errors(exc): if not isinstance(exc, UnicodeEncodeError): raise exc try: replace = r'&%s;' % codepoint2name[ord(exc.object[exc.start])] except KeyError: return codecs.xmlcharrefreplace_errors(exc) return replace, exc.start + 1
codecs.register_error('htmlcharrefreplace', htmlcharrefreplace_errors)
Even if do not register this handler from the start, it may be worth to provide htmlcharrefreplace_errors() in the html or html.entities module.
+1 for the idea and the name of 'htmlcharrefreplace'. -- ~Ethan~