[Python-Dev] PEP 293, Codec Error Handling Callbacks
Martin v. Loewis
13 Aug 2002 00:12:31 +0200
Walter D=F6rwald <email@example.com> writes:
> Output is as follows:
> 1790000 chars, 2.330% unenc
> ignore: 0.022 (factor=3D1.000)
> xmlcharrefreplace: 0.044 (factor=3D1.962)
> xml2: 0.267 (factor=3D12.003)
> xml3: 0.723 (factor=3D32.506)
> workaround: 5.151 (factor=3D231.702)
> i.e. a 1.7MB string with 2.3% unencodable characters was
Those numbers are impressive. Can you please add
if isinstance(exc, UnicodeEncodeError):
if exc.end-exc.start =3D=3D 1:
r =3D 
for c in exc.object[exc.start:exc.end]:
r.extend([u"&#", str(ord(c)), u";"])
raise TypeError("don't know how to handle %r" % exc)
and report how that performs (assuming I made no error)?
> Using a callback instead of the inline implementation is a factor of
> 12 slower than ignore.
For the purpose of comparing C and Python, this isn't relevant, is it?
Only the C version of xmlcharrefreplace and a Python version should be
> It can't really be fixed for codecs implemented in Python. For codecs
> that use the C functions we could add the functionality that e.g.
> PyUnicodeEncodeError_SetReason(exc) sets exc.reason and exc.args,
> but AFAICT it can't be done easily for Python where attribute assignment
> directly goes to the instance dict.
You could add methods into the class set_reason etc, which error
handler authors would have to use.
Again, these methods could be added through Python code, so no C code
would be necessary to implemenet them.
You could even implement a setattr method in Python - although you'ld
have to search this from C while initializing the class.