[Python-Dev] Unicode exception indexing

"Martin v. Löwis" martin at v.loewis.de
Thu Nov 3 22:47:00 CET 2011


>> On the one hand, these indices are used in formatting error messages such as
>> "codec can't encode character \u%04x in position %d", suggesting they  
>> are regular
>> indices into the string (counting code points).
>>
>> On the other hand, they are used by error handlers to lookup the character,
>> and existing error handlers (including the ones we have now) use
>> PyUnicode_AsUnicode to find the character. This suggests that the indices
>> should be Py_UNICODE indices, for compatibility (and they currently do
>> work in this way).
> 
> But what about error handlers written in Python?

I'm working on a patch where an C error handler using
PyUnicodeEncodeError_GetStart gets a different value than a Python
error handler accessing .start. The _GetStart/_GetEnd functions would
take the value from the exception object, and adjust it before returning
it.

The implementation is fairly straight-forward, just a little expensive
(in the case of non-BMP strings on Windows).

Regards,
Martin


More information about the Python-Dev mailing list