[Python-ideas] Add "htmlcharrefreplace" error handler
Serhiy Storchaka
storchaka at gmail.com
Tue Jun 11 16:49:51 CEST 2013
I propose to add "htmlcharrefreplace" error handler which is similar to
"xmlcharrefreplace" error handler but use html entity names if possible.
>>> '∀ x∈ℜ'.encode('ascii', 'xmlcharrefreplace')
b'∀ x∈ℜ'
>>> '∀ x∈ℜ'.encode('ascii', 'htmlcharrefreplace')
b'∀ x∈ℜ'
Possible implementation:
import codecs
from html.entities import codepoint2name
def htmlcharrefreplace_errors(exc):
if not isinstance(exc, UnicodeEncodeError):
raise exc
try:
replace = r'&%s;' % codepoint2name[ord(exc.object[exc.start])]
except KeyError:
return codecs.xmlcharrefreplace_errors(exc)
return replace, exc.start + 1
codecs.register_error('htmlcharrefreplace', htmlcharrefreplace_errors)
Even if do not register this handler from the start, it may be worth to
provide htmlcharrefreplace_errors() in the html or html.entities module.
More information about the Python-ideas
mailing list