
On Thu, 03 Nov 2011 22:47:00 +0100 "Martin v. Löwis" <martin@v.loewis.de> wrote:
On the one hand, these indices are used in formatting error messages such as "codec can't encode character \u%04x in position %d", suggesting they are regular indices into the string (counting code points).
On the other hand, they are used by error handlers to lookup the character, and existing error handlers (including the ones we have now) use PyUnicode_AsUnicode to find the character. This suggests that the indices should be Py_UNICODE indices, for compatibility (and they currently do work in this way).
But what about error handlers written in Python?
I'm working on a patch where an C error handler using PyUnicodeEncodeError_GetStart gets a different value than a Python error handler accessing .start. The _GetStart/_GetEnd functions would take the value from the exception object, and adjust it before returning it.
Is it worth the hassle? We can just port our existing error handlers, and I guess the few third-party error handlers written in C (if any) can bear the transition. Regards Antoine.