[Python-Dev] PEP 293, Codec Error Handling Callbacks

Martin v. Loewis martin@v.loewis.de
13 Aug 2002 00:12:31 +0200

Walter D=F6rwald <walter@livinglogic.de> writes:

> Output is as follows:
> 1790000 chars, 2.330% unenc
> ignore: 0.022 (factor=3D1.000)
> xmlcharrefreplace: 0.044 (factor=3D1.962)
> xml2: 0.267 (factor=3D12.003)
> xml3: 0.723 (factor=3D32.506)
> workaround: 5.151 (factor=3D231.702)
> i.e. a 1.7MB string with 2.3% unencodable characters was
> encoded.

Those numbers are impressive. Can you please add

def xml4(exc):
  if isinstance(exc, UnicodeEncodeError):
    if exc.end-exc.start =3D=3D 1:
      return u"&#"+str(ord(exc.object[exc.start]))+u";"
      r =3D []
      for c in exc.object[exc.start:exc.end]:
        r.extend([u"&#", str(ord(c)), u";"])
      return u"".join(r)
    raise TypeError("don't know how to handle %r" % exc)

and report how that performs (assuming I made no error)?

> Using a callback instead of the inline implementation is a factor of
> 12 slower than ignore.

For the purpose of comparing C and Python, this isn't relevant, is it?
Only the C version of xmlcharrefreplace and a Python version should be

> It can't really be fixed for codecs implemented in Python. For codecs
> that use the C functions we could add the functionality that e.g.
> PyUnicodeEncodeError_SetReason(exc) sets exc.reason and exc.args[3],
> but AFAICT it can't be done easily for Python where attribute assignment
> directly goes to the instance dict.

You could add methods into the class set_reason etc, which error
handler authors would have to use.

Again, these methods could be added through Python code, so no C code
would be necessary to implemenet them.

You could even implement a setattr method in Python - although you'ld
have to search this from C while initializing the class.